API Reference
The backend exposes two HTTP endpoints.
GET /health
Readiness probe. Returns immediately once the server is up and the model is warmed up.
Response
{ "status": "ok" }Usage
The frontend polls this endpoint on page load until it receives a 200 response before enabling the upload button. This handles the cold-start delay when model weights are being loaded or downloaded.
POST /infer/video
Runs Wholebody3d pose estimation on every frame of the uploaded video and streams results as Server-Sent Events.
Request
Content-Type: multipart/form-data
Body: file=<video file>The file field must be named file. Any video format supported by OpenCV is accepted (MP4, MOV, AVI, etc.).
Response
Content-Type: text/event-streamThe response body is a stream of SSE events. Each event is a line of the form:
data: <JSON>\n\nProgress event
Emitted after each frame is processed.
{
"type": "progress",
"frame": 42,
"total": 299,
"pct": 14.0,
"fps": 12.3,
"elapsed": 3.4,
"eta": 20.8
}| Field | Type | Description |
|---|---|---|
frame | int | Zero-based index of the frame just processed |
total | int | Total number of frames in the video |
pct | float | Percentage complete (0–100) |
fps | float | Current inference throughput (frames per second) |
elapsed | float | Seconds elapsed since inference started |
eta | float | Estimated seconds remaining |
Result event
Emitted once, after all frames have been processed.
{
"type": "result",
"fps": 120.0,
"frame_width": 1920,
"frame_height": 1080,
"total_frames": 300,
"n_kpts": 133,
"frames": [[...], [...], ...]
}| Field | Type | Description |
|---|---|---|
fps | float | Video frame rate from container metadata |
frame_width | int | Inference frame width (pixels) |
frame_height | int | Inference frame height (pixels) |
total_frames | int | Number of frames in frames array |
n_kpts | int | Number of keypoints per person (133 for Wholebody3d) |
frames | float[][] | Per-frame keypoint data (see below) |
Frame data format
Each element of frames is a flat float[] of length n_kpts × 6:
Index 0 … n_kpts×3 - 1 2D section
Index n_kpts×3 … n_kpts×6 - 1 3D section2D section (n_kpts × 3 values): [x0, y0, s0, x1, y1, s1, ...]
x,y— keypoint coordinates in inference-frame pixelss— confidence score in[0, 1]
3D section (n_kpts × 3 values): [x0, y0, z0, x1, y1, z1, ...]
- 3D coordinates in the model's local coordinate system
- SprintLab currently uses only the 2D section for metric computation
No-detection frame
If the model finds no person in a frame, a zero-filled array of the expected length is returned:
flat = [0.0] * (n_kpts * 3 + n_kpts * 3)The frontend handles this gracefully — zero-confidence keypoints are filtered out by the score ≥ 0.35 threshold.
Error handling
If OpenCV fails to open the video file, the stream will terminate without a result event. The frontend detects this (result event never arrives) and shows an error state.