REST API Reference¶

gpuctl provides a complete RESTful API built on FastAPI. Once started, you can use the Swagger UI for interactive exploration.

Basic Information¶

Item	Value
Base URL	`http://localhost:8000`
API Prefix	`/api/v1`
Response Format	JSON
Error Format	`{"error": "error message"}`
Swagger UI	`http://localhost:8000/docs`

Starting the API Server¶

python server/main.py

Base Endpoints¶

`GET /`¶

Returns basic service information.

{
    "message": "GPU Control API",
    "version": "1.0.0"
}

`GET /health`¶

Health check endpoint.

{
    "status": "healthy",
    "timestamp": "2026-03-01T00:00:00"
}

Job API¶

Base Path: /api/v1/jobs

`POST /api/v1/jobs` — Create Job¶

Request body:

{
    "yamlContent": "kind: training\nversion: v0.1\njob:\n  name: my-job\n..."
}

Response (201):

{
    "jobId": "my-job",
    "name": "my-job",
    "kind": "training",
    "status": "pending",
    "createdAt": "2024-01-01T00:00:00",
    "message": "Job submitted to resource pool"
}

`POST /api/v1/jobs/batch` — Batch Create Jobs¶

Request body:

{
    "yamlContents": [
        "kind: training\nversion: v0.1\n...",
        "kind: inference\nversion: v0.1\n..."
    ]
}

Response (201):

{
    "success": [
        {"jobId": "job-1", "name": "training-1"},
        {"jobId": "job-2", "name": "inference-1"}
    ],
    "failed": [
        {"index": 2, "error": "Unsupported kind: unknown"}
    ]
}

`GET /api/v1/jobs` — List Jobs¶

Query parameters:

Parameter	Type	Description
`kind`	string	Filter by job type: training / inference / notebook / compute
`pool`	string	Filter by resource pool name
`status`	string	Filter by status
`namespace`	string	Filter by namespace
`page`	int	Page number, default 1
`pageSize`	int	Items per page, default 20, max 100

Response (200):

{
    "total": 5,
    "items": [
        {
            "jobId": "my-inference-854c6c5cd-76ztc",
            "name": "my-inference",
            "namespace": "default",
            "kind": "inference",
            "status": "Running",
            "ready": "1/1",
            "node": "node-1",
            "ip": "10.42.0.43",
            "age": "2h"
        }
    ]
}

`GET /api/v1/jobs/{jobId}` — Get Job Details¶

Path parameter: jobId — accepts either the job name or Pod name

Query parameter: namespace — optional; searches all gpuctl namespaces if omitted

Response (200):

href="#__codelineno-8-1">{ "job_id": "my-notebook-job-0", "name": "my-notebook-job-0", "namespace": "team-alice", "kind": "notebook", "resource_type": "StatefulSet", "status": "Running", "age": "25m", "started": "2026-03-01T03:30:39+00:00", "completed": null, "priority": "medium", "pool": "dev-pool", "resources": { "cpu": "8", "memory": "32Gi", "gpu": 1 }, "metrics": {}, "yaml_content": { "kind": "notebook", "version": "v0.1", "job": { "name": "my-notebook-job", "namespace": "team-alice" }, "environment": { "image": "jupyter/base-notebook:latest", "command": [] }, "service": { "port": 8888 }, "resources": { "pool": "dev-pool", "gpu": 1, "cpu": 8, "memory": "32Gi" } }, "events": [ { "age": "24m", "type": "Normal", "reason": "Started", "from": "kubelet", "object": "Pod/my-notebook-job-0", "message": "Started container notebook" } ], "access_methods": { "pod_ip_access": { "pod_ip": "10.42.0.49", "port": 8888, "url": "http://10.42.0.49:8888" }, "node_port_access": { "node_ip": "192.168.1.100", "node_port": 30001, "url": "http://192.168.1.100:30001" } } }

Note

resource_type: actual K8s resource type (Pod / Job / Deployment / StatefulSet)
yaml_content: gpuctl YAML structure reverse-mapped from the K8s resource
access_methods: only returned for inference / compute / notebook; null for training

`DELETE /api/v1/jobs/{jobId}` — Delete Job¶

Query parameter: force=true — force delete

Response (200):

{
    "jobId": "my-job",
    "status": "terminating",
    "message": "Job deletion command issued"
}

`GET /api/v1/jobs/{jobId}/logs` — Get Logs¶

Query parameters:

Parameter	Description
`tail`	Return last N lines, default 100
`pod`	Specify Pod name (for multi-Pod jobs)

Response (200):

{
    "logs": ["2024-01-01 00:00:00 Starting...", "..."],
    "lastTimestamp": "2024-01-01T00:05:00"
}

`WS /api/v1/jobs/{jobId}/logs/ws` — WebSocket Streaming Logs¶

Streams logs continuously after connection. Each message format:

{"type": "log", "data": "2024-01-01 00:00:10 Step 100/1000 loss=0.32"}

Resource Pool API¶

Base Path: /api/v1/pools

Method	Path	Function
GET	`/api/v1/pools`	List resource pools
GET	`/api/v1/pools/{poolName}`	Get pool details
POST	`/api/v1/pools`	Create resource pool
DELETE	`/api/v1/pools/{poolName}`	Delete resource pool

Node API¶

Base Path: /api/v1/nodes

Method	Path	Function
GET	`/api/v1/nodes`	List nodes (supports pool/gpuType/status filters)
GET	`/api/v1/nodes/{nodeName}`	Node details (labels, resource usage)
GET	`/api/v1/nodes/gpu-detail`	GPU details for all nodes
POST	`/api/v1/nodes/{nodeName}/pools`	Add node to a resource pool
DELETE	`/api/v1/nodes/{nodeName}/pools/{poolName}`	Remove node from a pool
GET	`/api/v1/nodes/{nodeName}/labels`	Get all labels on a node

Label API¶

Method	Path	Function
POST	`/api/v1/nodes/{nodeName}/labels`	Add label to a node
POST	`/api/v1/nodes/labels/batch`	Batch add node labels
GET	`/api/v1/nodes/{nodeName}/labels/{key}`	Get a specific label on a node
PUT	`/api/v1/nodes/{nodeName}/labels/{key}`	Update a node label
DELETE	`/api/v1/nodes/{nodeName}/labels/{key}`	Delete a node label
GET	`/api/v1/labels`	Overview of all node labels (aggregated by key)
GET	`/api/v1/nodes/labels`	Query a specific label across all nodes (requires `key` param)
GET	`/api/v1/nodes/labels/all`	GPU-related labels and pool bindings for all nodes

Quota API¶

Base Path: /api/v1/quotas

Method	Path	Function
POST	`/api/v1/quotas`	Create quota (YAML format)
GET	`/api/v1/quotas`	List quotas (supports namespace filter)
GET	`/api/v1/quotas/{namespaceName}`	Get namespace quota details (with utilization)
DELETE	`/api/v1/quotas/{namespaceName}`	Delete namespace quota

Namespace API¶

Base Path: /api/v1/namespaces

Manages only gpuctl-created namespaces

Namespaces with the runwhere.ai/namespace=true label, and the default namespace.

Method	Path	Function
GET	`/api/v1/namespaces`	List namespaces
GET	`/api/v1/namespaces/{namespaceName}`	Get namespace details (including quota info)
DELETE	`/api/v1/namespaces/{namespaceName}`	Delete namespace

Error Responses¶

All APIs use a unified error format:

HTTP Status	Meaning
400	Invalid request parameters (e.g. malformed YAML)
404	Resource not found
409	Resource conflict (e.g. label already exists and overwrite not set)
500	Internal server error (e.g. K8s cluster issue)

{
    "error": "specific error message"
}

REST API Reference¶

Basic Information¶

Starting the API Server¶

Base Endpoints¶

GET /¶

GET /health¶

Job API¶

POST /api/v1/jobs — Create Job¶

POST /api/v1/jobs/batch — Batch Create Jobs¶

GET /api/v1/jobs — List Jobs¶

GET /api/v1/jobs/{jobId} — Get Job Details¶

DELETE /api/v1/jobs/{jobId} — Delete Job¶

GET /api/v1/jobs/{jobId}/logs — Get Logs¶

WS /api/v1/jobs/{jobId}/logs/ws — WebSocket Streaming Logs¶

Resource Pool API¶

Node API¶

Label API¶

Quota API¶

Namespace API¶

Error Responses¶

`GET /`¶

`GET /health`¶

`POST /api/v1/jobs` — Create Job¶

`POST /api/v1/jobs/batch` — Batch Create Jobs¶

`GET /api/v1/jobs` — List Jobs¶

`GET /api/v1/jobs/{jobId}` — Get Job Details¶

`DELETE /api/v1/jobs/{jobId}` — Delete Job¶

`GET /api/v1/jobs/{jobId}/logs` — Get Logs¶

`WS /api/v1/jobs/{jobId}/logs/ws` — WebSocket Streaming Logs¶