Quickstart¶

Get gpuctl installed and submit your first job to a Kubernetes cluster in under 5 minutes.

Prerequisites¶

Python 3.8+
Access to a Kubernetes cluster (kubectl configured with a valid kubeconfig)
At least one available node in the cluster

Step 1: Install gpuctl¶

From Source (Recommended)Binary Install

git clone https://github.com/g8s-host/gpuctl.git
cd gpuctl
pip install -e .

# Linux x86_64
wget https://github.com/g8s-host/gpuctl/releases/latest/download/gpuctl-linux-amd64 -O gpuctl
chmod +x gpuctl
sudo mv gpuctl /usr/local/bin/

# macOS x86_64
curl -L https://github.com/g8s-host/gpuctl/releases/latest/download/gpuctl-macos-amd64 -o gpuctl
chmod +x gpuctl
sudo mv gpuctl /usr/local/bin/

Verify the installation:

gpuctl --help

Step 2: Verify Cluster Connection¶

gpuctl get nodes

Example output:

NODE NAME     STATUS   GPU TOTAL  GPU USED  GPU FREE  GPU TYPE    IP              POOL
node-1        Ready    8          0         8         A100-80G    192.168.1.101   default
node-2        Ready    4          0         4         A10-24G     192.168.1.102   default

Step 3: Submit Your First Job¶

The following example uses the nginx image (no GPU required) to quickly verify the platform is working:

1. Create a YAML file

hello-gpuctl.yaml

kind: compute
version: v0.1

job:
  name: hello-gpuctl
  priority: medium
  description: My first gpuctl job

environment:
  image: nginx:latest
  command: []
  args: []

service:
  replicas: 1
  port: 80

resources:
  pool: default
  gpu: 0
  cpu: 1
  memory: 512Mi

2. Submit the job

gpuctl create -f hello-gpuctl.yaml

Output:

Job created successfully: hello-gpuctl
Namespace: default

3. Check job status

gpuctl get jobs

JOB ID                              NAME          NAMESPACE  KIND     STATUS    READY  NODE    IP           AGE
hello-gpuctl-7d9c8b-xk2lp           hello-gpuctl  default    compute  Running   1/1    node-1  10.42.0.5    2m

4. Inspect the job

gpuctl describe job hello-gpuctl

5. View logs

gpuctl logs hello-gpuctl

6. Delete the job

gpuctl delete job hello-gpuctl

Step 4: Submit a GPU Training Job (Optional)¶

If your cluster has GPU nodes, try submitting a simple training job:

simple-training.yaml

kind: training
version: v0.1

job:
  name: simple-training
  priority: medium

environment:
  image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
  command: ["python", "-c", "import torch; print(f'GPU available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"]

resources:
  pool: default
  gpu: 1
  cpu: 4
  memory: 16Gi

gpuctl create -f simple-training.yaml
gpuctl logs simple-training -f

Common Commands Reference¶

Scenario	Command
Submit a job	`gpuctl create -f job.yaml`
List all jobs	`gpuctl get jobs`
Filter by job type	`gpuctl get jobs --kind training`
Stream logs	`gpuctl logs <job-name> -f`
Job details	`gpuctl describe job <job-name>`
Delete a job	`gpuctl delete job <job-name>`
List resource pools	`gpuctl get pools`
List nodes	`gpuctl get nodes`

Next Steps¶

Training Jobs — Submit LlamaFactory / DeepSpeed distributed training
Inference Services — Deploy a VLLM inference API service
Resource Pool Management — Create and manage GPU resource pools
CLI Reference — Full command documentation