User Guide¶
Welcome to gpuctl! This guide will help you master gpuctl's core features from scratch and efficiently manage GPU compute resources.
Contents¶
-
Quickstart
Complete installation and submit your first job in under 5 minutes.
-
Training Jobs
LlamaFactory and DeepSpeed distributed training, with complete examples for single-node multi-GPU and multi-node multi-GPU scenarios.
-
Inference Services
Deploy inference services using VLLM and similar frameworks, with multi-replica and auto-scaling support.
-
Notebook
Launch a JupyterLab environment with GPU resources attached in one command, for rapid prototyping.
-
Compute Jobs
Deploy CPU services like nginx and redis without worrying about Kubernetes Deployment details.
-
Resource Pool Management
Partition nodes into resource pools for training/inference isolation and fine-grained scheduling.
-
Quotas & Namespaces
Set CPU, memory, and GPU quotas per team or user to prevent resource abuse.
YAML Configuration Overview¶
All resources are defined through declarative YAML. The following describes the common fields:
kind: training # Job type: training / inference / notebook / compute / pool / quota
version: v0.1 # Version, currently fixed at v0.1
job:
name: my-job # Job name (also used as the K8s resource name)
priority: medium # Priority: high / medium / low
description: "..." # Optional description
environment:
image: my-image:tag # Container image
imagePullSecret: xxx # Image pull secret (optional)
command: [...] # Startup command
args: [...] # Command arguments (optional)
env: # Environment variables (optional)
- name: KEY
value: VALUE
resources:
pool: default # Resource pool name (default: default)
gpu: 0 # Number of GPUs (0 for CPU-only jobs)
gpu-type: A100-100G # GPU model (optional, K8s schedules any GPU if omitted)
cpu: 4 # CPU cores
memory: 8Gi # Memory size
service: # Only applicable to inference / notebook / compute
replicas: 1 # Number of replicas
port: 8080 # Service port
healthCheck: /health # Health check path (optional)
storage:
workdirs: # Host directory mount list
- path: /data/models
- path: /output
Naming Rules
The job.name field is used directly as the Kubernetes resource metadata.name. Names must follow K8s naming conventions: lowercase letters, numbers, and hyphens only, max 63 characters.