Help/📊 Analytics & Sessions/Ops Dataset Build from Approved Label Packs

Understand AI timelines, manual tagging, and how to promote sessions.

Ops Dataset Build from Approved Label Packs

Build datasets from explicit approved label-pack IDs with deterministic inclusion and exclusion lineage.

Ops Dataset Build from Approved Label Packs

Dataset builds use explicit label-pack IDs in v1, but the primary internal workflow now lives in Ops > Datasets.

Build from Ops > Datasets

Use the Build dataset action from the page header or the empty state in Ops > Datasets.

The drawer walks through the current backend contract:

choose the dataset capability
optionally enter a dataset name and description
choose the split strategy and split ratio
review explicit label-pack candidates for the active workspace, grouped by match and then by media
let preflight confirm selected-pack count, distinct split-group coverage, and any validation issues before submit

Current build capabilities in this workflow:

ball_detection
player_detection
jersey_recognition

Player Detection builds require approved, dataset-eligible player_detection label packs in YOLO ZIP v1 format. If the backend rollout gate is disabled, selecting Player Detection shows an unsupported state and the drawer will not submit.

Inside the source selector:

each match section shows a human-readable match label plus its match_id, and each media section shows the readable media title plus media_id while grouping the packs from that media together
each pack row shows the latest published revision timestamp, and group-level checkboxes let you select or deselect all packs in one match or media section without changing the explicit label_pack_ids build contract

When the build succeeds, the drawer shows one of three deterministic outcomes:

created
- a new dataset version was created
reused_in_progress
- an equivalent build already exists and is still running
reused_success
- an equivalent ready dataset already exists

The success state always includes a direct Open dataset detail link. If current filters or pagination hide the resulting row in the list, use that link instead of changing filters manually.

Approved label packs do not create dataset versions automatically. A dataset version exists only after an explicit build request succeeds or reuses an existing equivalent build.

Internal Ops APIs now expose deterministic candidate and preflight reads for this workflow:

GET /api/v1/datasets/build/candidates returns current-workspace packs for one capability plus their default inclusion state, match/media grouping metadata, and latest published revision timestamp
POST /api/v1/datasets/build/preflight validates an explicit selection and returns group counts, exclusions, and validation issues without creating a dataset

Source selection rules (v1)

Request must include label_pack_ids.
Filter-style selectors are not supported in v1.
Requests with selector fields fail with:
- 400 FILTERS_NOT_SUPPORTED_V1

Eligibility rules

A label pack is included only if all checks pass:

capability matches dataset capability
review_status = approved
validation_status = passed
workspace training-use policy allows inclusion
caller has access to the source workspace

Revision-level eligibility is enforced on label_pack_revisions:

default include policy: eligibility_status = APPROVED
developer-only override: include_reviewed_override = true allows eligibility_status = REVIEWED
non-developer override attempts fail with:
- 403 ELIGIBILITY_OVERRIDE_FORBIDDEN

Where to approve dataset eligibility:

Go to Ops > Prelabels
Open the relevant Label Review Series
In Revision History, use the Dataset eligibility column
- Approve for dataset for Reviewed revisions
- Revoke for currently Approved revisions

This is separate from label-pack lifecycle approval:

pack approval makes the scoped label pack approved
dataset eligibility approval makes the published revision eligible for default dataset inclusion

If no requested pack is eligible, build fails fast:

400 NO_ELIGIBLE_LABEL_DATA

Deterministic lineage outputs

Build responses include:

outcome:
- created
- reused_in_progress
- reused_success
build_status and linked job_id/job_status
included_label_pack_ids (sorted deterministic list)
excluded_label_pack_ids with canonical reason codes

Canonical exclusion reason codes:

LABEL_PACK_NOT_FOUND
LABEL_PACK_NOT_APPROVED
LABEL_PACK_REVOKED
LABEL_PACK_VALIDATION_NOT_PASSED
LABEL_PACK_CAPABILITY_MISMATCH
WORKSPACE_TRAINING_OPT_OUT
LABEL_PACK_ACCESS_DENIED

Reproducibility

Given the same effective included sources + split/policy/options snapshot:

default (force=false) reuses an equivalent build deterministically
force=true bypasses reuse and creates a new build + job

Equivalent snapshot detection is based on persisted reuse_fingerprint; full request provenance is tracked with build_fingerprint.

Explore built datasets

Open Ops → Datasets to inspect built dataset versions without starting a training job.

List view

The list view is cursor-paginated and shows:

dataset display name
version_id
copyable dataset_version_id
capability and scope
build status and materialization status
split counts
included/excluded source counts
manifest hash

Use the filters at the top of the page to narrow by capability, scope, or status.

Player Detection datasets appear under the Player Detection capability filter. They use materialized frame samples and are training-ready only after the dataset is ready and materialization has completed successfully.

Detail view

Select a dataset name to open its detail page. The detail page is metadata-only:

provenance (manifest_hash, build/reuse fingerprints, seed)
build status and latest job status
frozen-state summary and the latest freeze/unfreeze audit context
quality summary with deterministic hard-fail/warning indicators
included and excluded label-pack lineage
materialization summary counts
split and source stats

Freeze and unfreeze controls live in the detail-page header:

Freeze dataset appears when the dataset is not frozen and the current Ops user can freeze it.
Unfreeze dataset appears only for developer users when the dataset is frozen, and requires a non-empty reason in the confirmation modal.

If the dataset is still building or metadata is incomplete, the detail page shows Stats not available yet with stable placeholder values instead of a broken or blank layout.

Quality status in Ops

Ops surfaces dataset quality directly in the explorer:

list rows show one badge:
- Pass
- Warnings (n)
- Hard fail (n)
- Not evaluated
detail view shows the ordered indicator list plus the active quality policy version

Current quality checks

Quality policy v1 is active for:

ball_detection
player_detection

Common warning codes:

LOW_SAMPLE_COUNT
LOW_GROUP_COUNT
HIGH_EMPTY_LABEL_RATE
LOW_LABEL_COVERAGE
CLASS_IMBALANCE
MATERIALIZATION_PARTIAL

Common hard-fail codes:

INVALID_LABEL_SCHEMA
INVALID_GEOMETRY
MISSING_REQUIRED_STATS
MISSING_MATERIALIZED_ASSETS

Older datasets that predate quality evaluation render Not evaluated instead of a blank quality panel.

Inspect dataset samples

From the dataset detail page, use View samples to open the read-only sample viewer.

Sample viewer

The sample viewer is split into two areas:

left rail:
- filters for split, class, camera type, and group
- sample list for the current filter set
- Previous / Next navigation within the current result set
right side:
- selected sample metadata
- preview image
- overlay state and fallback reason codes

Supported overlays

Current supported sample rendering:

ball_detection
- point and bounding-box overlays rendered together on the frame
- the point shows the published ball center and the bbox shows the same published label geometry
player_detection
- bounding-box overlay rendered on the frame

Unsupported sample types still open in the viewer, but the page shows a clear unsupported state instead of a broken overlay.

Fallback states

The viewer is read-only and always prefers an explicit state over a blank screen.

Common fallback cases:

Sample asset unavailable
- the sample has no resolvable asset URL
Overlay unavailable
- the image loaded but overlay data could not be rendered
This sample type is not supported yet
- the sample capability is outside the current viewer scope

When available, the viewer also shows a deterministic reason_code to help debug dataset or label-pack issues.

Viewer caching

The sample viewer uses short-lived cached reads:

sample list TTL: 60s
sample detail TTL: 60s
next page is prefetched when the current page includes a next cursor
expiring sample asset URLs are refreshed automatically when close to expiry

Quality enforcement

Quality evaluation always runs on new dataset builds. Build blocking depends on the server flag:

DATASET_QUALITY_ENFORCEMENT_ENABLED=false
- dataset is created even when hard-fail indicators exist
- Ops still sees the hard-fail quality badge
DATASET_QUALITY_ENFORCEMENT_ENABLED=true
- build fails with DATASET_QUALITY_HARD_FAIL
- the failed dataset still persists the computed quality summary for review

Frozen test set workflow

Use frozen datasets for official evaluation. A frozen dataset is protected from training use.

What freeze does

When a dataset is frozen:

training cannot use that dataset directly
training cannot use another dataset that reuses the same protected source label packs
evaluation in frozen-required flows must use a frozen dataset

Who can change frozen state

Freeze: internal Ops (is_developer or is_support)
Unfreeze: is_developer only

Unfreeze always requires a non-empty reason and every allow/deny attempt is audited.

Operator-visible outcomes

Common deterministic error codes:

FROZEN_DATASET_TRAINING_FORBIDDEN
FROZEN_SOURCE_LEAKAGE_FORBIDDEN
DATASET_NOT_FROZEN
DATASET_ALREADY_FROZEN
UNFREEZE_PERMISSION_REQUIRED
UNFREEZE_REASON_REQUIRED
DATASET_IN_ACTIVE_USE
FREEZE_STATE_CONFLICT

Inspect frozen state and history

Dataset detail exposes:

current is_frozen_test_set
frozen_at
last freeze audit metadata
last unfreeze audit metadata

Full frozen-state audit history is available through the dataset freeze-history API for internal Ops workflows.

Didn’t find what you need? Email support@soccer-insights.com or mention us in Slack if your club has a shared channel.