CLI Command Reference¶

The gpuctl CLI mirrors kubectl but with semantics optimized for ML engineers.

Command Overview¶

Command	Description
`create`	Create resources from YAML (jobs, resource pools, quotas)
`apply`	Apply configuration (create or update — equivalent to delete + create)
`get`	List resources
`describe`	View resource details
`delete`	Delete resources
`logs`	View job logs
`label`	Manage node labels

create¶

Create resources from a YAML file.

gpuctl create -f <file> [-n <namespace>] [--json]

Option	Description	Required
`-f, --file`	YAML file path (can be specified multiple times)	Yes
`-n, --namespace`	Namespace (default: `default`)	No
`--json`	Output in JSON format	No

Examples:

# Submit a single job
gpuctl create -f training-job.yaml

# Submit multiple jobs at once
gpuctl create -f task1.yaml -f task2.yaml

# Specify a namespace
gpuctl create -f job.yaml -n team-alice

# JSON output
gpuctl create -f job.yaml --json

apply¶

Apply resource configuration (deletes the existing resource, then creates a new one).

gpuctl apply -f <file> [-n <namespace>] [--json]

Option	Description	Required
`-f, --file`	YAML file path (can be specified multiple times)	Yes
`-n, --namespace`	Namespace (default: `default`)	No
`--json`	Output in JSON format	No

Examples:

# Update a job configuration
gpuctl apply -f job.yaml

# Update with a specific namespace
gpuctl apply -f job.yaml -n team-alice

get¶

List resources. Supports multiple resource types and filter options.

get jobs¶

gpuctl get jobs [-n <namespace>] [--pool <pool>] [--kind <kind>] [--pods] [--json]

Option	Description
`-n, --namespace`	Filter by namespace (all namespaces if omitted)
`--pool`	Filter by resource pool
`--kind`	Filter by job type: `training` / `inference` / `notebook` / `compute`
`--pods`	Show Pod-level info (instead of Deployment/StatefulSet level)
`--json`	JSON output

Output columns:

Column	Meaning
JOB ID	Pod name (with K8s auto-generated hash suffix)
NAME	`job.name` from the YAML
NAMESPACE	Namespace
KIND	Job type
STATUS	Pod running status
READY	Ready/total containers (e.g. `1/1`)
NODE	Node the pod was scheduled to
IP	Pod IP
AGE	Time since creation

Examples:

gpuctl get jobs
gpuctl get jobs -n team-alice
gpuctl get jobs --pool training-pool
gpuctl get jobs --kind training
gpuctl get jobs --pods

get pools¶

gpuctl get pools [--json]

Example:

gpuctl get pools

get nodes¶

gpuctl get nodes [--pool <pool>] [--gpu-type <type>] [--json]

Option	Description
`--pool`	Filter by resource pool
`--gpu-type`	Filter by GPU model

Examples:

gpuctl get nodes
gpuctl get nodes --pool training-pool
gpuctl get nodes --gpu-type A100-100G

get labels¶

gpuctl get labels [<node_name>] [--key <key>] [--json]

Examples:

# View all labels on a node
gpuctl get labels node-1

# View a specific label
gpuctl get labels node-1 --key=runwhere.ai/gpu-type

get quotas¶

gpuctl get quotas [<namespace>] [--json]

Examples:

gpuctl get quotas
gpuctl get quotas team-alice

get ns / namespaces¶

gpuctl get ns [--json]
gpuctl get namespaces [--json]

describe¶

View detailed resource information.

describe job¶

gpuctl describe job <job_name> [-n <namespace>] [--json]

Output includes:

Basic info: Name, Kind, Resource Type, Namespace, Status, Age, Priority, Pool
Resource config: CPU, Memory, GPU
Raw YAML: gpuctl YAML config reverse-mapped from the K8s resource
Events: Last 10 K8s events
Access Methods (inference / notebook / compute only): Pod IP and NodePort addresses

Examples:

gpuctl describe job my-training-job
gpuctl describe job my-training-job -n team-alice

describe pool¶

gpuctl describe pool <pool_name> [--json]

Example:

gpuctl describe pool training-pool

describe node¶

gpuctl describe node <node_name> [--json]

Example:

gpuctl describe node node-1

describe quota / ns / namespace¶

gpuctl describe quota <namespace_name> [--json]
gpuctl describe ns <namespace_name> [--json]
gpuctl describe namespace <namespace_name> [--json]

Examples:

gpuctl describe quota team-alice
gpuctl describe ns team-alice

delete¶

Delete resources.

Delete via YAML file¶

gpuctl delete -f <file> [-n <namespace>] [--force] [--json]

Examples:

gpuctl delete -f training-job.yaml
gpuctl delete -f pool.yaml
gpuctl delete -f quota.yaml

delete job¶

gpuctl delete job <job_name> [-n <namespace>] [--force] [--json]

Examples:

gpuctl delete job my-training-job
gpuctl delete job my-training-job -n team-alice
gpuctl delete job my-training-job --force

delete quota¶

gpuctl delete quota <namespace_name> [--force] [--json]

Example:

gpuctl delete quota team-alice

delete ns / namespace¶

gpuctl delete ns <namespace_name> [--force] [--json]
gpuctl delete namespace <namespace_name> [--force] [--json]

Examples:

gpuctl delete ns team-alice
gpuctl delete ns team-alice --force

delete pool¶

gpuctl delete pool <pool_name> [--force] [--json]

Example:

gpuctl delete pool training-pool

Delete Job Behavior

When a job is deleted, the platform also deletes:

The primary K8s resource (Job / Deployment / StatefulSet)
The associated NodePort Service (training jobs have no Service)
K8s controllers will automatically cascade-delete associated Pods

logs¶

Retrieve job logs.

gpuctl logs <job_name> [-n <namespace>] [-f] [--json]

Option	Description
`<job_name>`	Job name
`-n, --namespace`	Namespace (default: `default`)
`-f, --follow`	Stream logs in real time (like `tail -f`)
`--json`	JSON output

Examples:

# View recent logs (last 100 lines by default)
gpuctl logs my-training-job

# Stream logs
gpuctl logs my-training-job -f

# Specify namespace
gpuctl logs my-training-job -n team-alice -f

label¶

Manage node labels.

gpuctl label <node_name> [node_name...] <label> [--delete] [--overwrite] [--json]

Label Key Convention

Label keys managed by gpuctl must be prefixed with runwhere.ai/ to avoid conflicts with other systems.

Option	Description
`<node_name>`	Node name (multiple nodes can be specified)
`<label>`	`key=value` format (to add) or `key` format (when deleting)
`--delete`	Delete the label
`--overwrite`	Overwrite an existing label with the same key

Examples:

# Add a GPU model label to a single node
gpuctl label node-1 runwhere.ai/gpu-type=A100-100G

# Add label to multiple nodes
gpuctl label node-1 node-2 runwhere.ai/gpu-type=A100-100G

# Overwrite an existing label
gpuctl label node-1 runwhere.ai/gpu-type=A10-24G --overwrite

# Delete a label
gpuctl label node-1 runwhere.ai/gpu-type --delete

# Assign a node to a resource pool
gpuctl label node-1 runwhere.ai/pool=training-pool

Global Options¶

Option	Description
`--help`	Show help information
`--json`	Output in JSON format (supported by most commands)

Full Workflow Examples¶

Training Job Lifecycle¶

# 1. Check available resources
gpuctl get nodes
gpuctl get pools

# 2. Submit a training job
gpuctl create -f training-job.yaml -n team-alice

# 3. Monitor status
gpuctl get jobs -n team-alice

# 4. Stream logs
gpuctl logs my-training -n team-alice -f

# 5. View job details
gpuctl describe job my-training -n team-alice

# 6. Clean up after completion
gpuctl delete job my-training -n team-alice

Node and Resource Pool Management¶

# 1. List all nodes
gpuctl get nodes

# 2. Label a node with its GPU model
gpuctl label node-5 runwhere.ai/gpu-type=A100-100G

# 3. Create a resource pool (binding nodes)
gpuctl create -f training-pool.yaml

# 4. View resource pools
gpuctl get pools
gpuctl describe pool training-pool

# 5. View nodes in a pool
gpuctl get nodes --pool training-pool