Quickstart¶
Get gpuctl installed and submit your first job to a Kubernetes cluster in under 5 minutes.
Prerequisites¶
- Python 3.8+
- Access to a Kubernetes cluster (
kubectlconfigured with a valid kubeconfig) - At least one available node in the cluster
Step 1: Install gpuctl¶
# Linux x86_64
wget https://github.com/g8s-host/gpuctl/releases/latest/download/gpuctl-linux-amd64 -O gpuctl
chmod +x gpuctl
sudo mv gpuctl /usr/local/bin/
# macOS x86_64
curl -L https://github.com/g8s-host/gpuctl/releases/latest/download/gpuctl-macos-amd64 -o gpuctl
chmod +x gpuctl
sudo mv gpuctl /usr/local/bin/
Verify the installation:
Step 2: Verify Cluster Connection¶
Example output:
NODE NAME STATUS GPU TOTAL GPU USED GPU FREE GPU TYPE IP POOL
node-1 Ready 8 0 8 A100-80G 192.168.1.101 default
node-2 Ready 4 0 4 A10-24G 192.168.1.102 default
Step 3: Submit Your First Job¶
The following example uses the nginx image (no GPU required) to quickly verify the platform is working:
1. Create a YAML file
hello-gpuctl.yaml
kind: compute
version: v0.1
job:
name: hello-gpuctl
priority: medium
description: My first gpuctl job
environment:
image: nginx:latest
command: []
args: []
service:
replicas: 1
port: 80
resources:
pool: default
gpu: 0
cpu: 1
memory: 512Mi
2. Submit the job
Output:
3. Check job status
JOB ID NAME NAMESPACE KIND STATUS READY NODE IP AGE
hello-gpuctl-7d9c8b-xk2lp hello-gpuctl default compute Running 1/1 node-1 10.42.0.5 2m
4. Inspect the job
5. View logs
6. Delete the job
Step 4: Submit a GPU Training Job (Optional)¶
If your cluster has GPU nodes, try submitting a simple training job:
simple-training.yaml
kind: training
version: v0.1
job:
name: simple-training
priority: medium
environment:
image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
command: ["python", "-c", "import torch; print(f'GPU available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"]
resources:
pool: default
gpu: 1
cpu: 4
memory: 16Gi
Common Commands Reference¶
| Scenario | Command |
|---|---|
| Submit a job | gpuctl create -f job.yaml |
| List all jobs | gpuctl get jobs |
| Filter by job type | gpuctl get jobs --kind training |
| Stream logs | gpuctl logs <job-name> -f |
| Job details | gpuctl describe job <job-name> |
| Delete a job | gpuctl delete job <job-name> |
| List resource pools | gpuctl get pools |
| List nodes | gpuctl get nodes |
Next Steps¶
- Training Jobs — Submit LlamaFactory / DeepSpeed distributed training
- Inference Services — Deploy a VLLM inference API service
- Resource Pool Management — Create and manage GPU resource pools
- CLI Reference — Full command documentation