Orchestrator: Continuous Learning CI/CD is a practical guide so any team can copy this repo and set up their own automated dataset→retrain pipeline.
This document explains what the orchestrator does, why each piece is needed, and how to set it up step-by-step (env, secrets, GH Actions, run/testing, troubleshooting).
- Orchestrator: Continuous Learning CI/CD
- Quick overview
- Why this structure?
- Architecture (in depth)
- Components & responsibilities
- Repo layout (what each file/folder is for)
- Getting started (step-by-step)
- How observability and logs look (what to expect)
- Testing & validation checklist
- Troubleshooting (common failure modes)
- Operational playbook
- Recommended best practices
The orchestrator is a small, tested Python service run by GitHub Actions that:
- reads misclassification counts from your DB,
- triggers dataset preparation on HF Space only when counts exceed thresholds,
- polls HF Hub metadata until a new version is available, and
- triggers retraining HF Spaces for only the updated labels : all with clear logging and dry-run support.
- Cost-efficient: avoid expensive dataset prep / training unless there’s enough data.
- Safe: retraining only on freshly uploaded datasets (version-aware).
- Auditable: step-by-step logs in GitHub Actions + optional W&B entries.
- Modular: supports multiple labels (department / urgency) independently.
graph TD
Start["START<br/>Orchestrator triggered (manual or scheduled)"] --> DBConnect["Connect to DB<br/>Fetch misclassified counts"]
DBConnect --> ComputeLen["Compute dataset_len per label"]
ComputeLen --> CheckThreshold{"dataset_len >= threshold?"}
CheckThreshold -->|Yes| TriggerPrep["Restart Dataset Prep Space<br/>for labels above threshold"]
CheckThreshold -->|No| SkipLabel["Skip label<br/>Log info"] --> CheckThreshold
TriggerPrep --> PollDataset["Poll HF Hub metadata<br/>Wait for new dataset version"]
PollDataset -->|Success| RestartRetrain["Restart retrain HF Space<br/>for updated labels"]
PollDataset -->|Error / Timeout| PollError["Polling error / timeout<br/>Log warning / retry / abort"]
RestartRetrain --> End["END<br/>Orchestration complete"]
PollError --> End
Components & responsibilities
- GitHub Actions : runs orchestrator on schedule or manual trigger (
workflow_dispatch). - orchestrator.py : reads DB counts, decides, restarts HF Spaces, polls HF Hub metadata, triggers retrain spaces. Supports
DRY_RUN. - Prepare Dataset HF Space : prepares dataset and pushes
dataset_metadata.json(withversion_tag,num_samples,created_at). - HF Hub dataset repo : single source of truth for latest dataset metadata.
- Retrain HF Spaces (per label) : containerized training that evaluates & deploys (if metric improved).
- PostgreSQL : stores grievances, misclassification reviews (source of counts).
- Weights & Biases (optional) : experiment and dataset logging, summary emails.
├── orchestrator/ # All CI/CD orchestration logic
│ ├── orchestrator.py # Main orchestrator script (decision engine)
│ ├── .env_examples # Example environment variables file (DO NOT commit real secrets)
│ ├── requirements.txt # Orchestrator-specific dependencies
│ └── __init__.py # package marker (optional)
├── .github/
│ └── workflows/
│ └── orchestrator.yml # GitHub Actions workflow to run the orchestrator
File purposes (brief)
-
orchestrator/orchestrator.py: core program. What it does: readsDATABASE_URL, computesmisclassified_count(SQL COUNT), computesdataset_len = mis_count + sampled_correct, compares withTHRESHOLD_*, restartsPREPARE_DATASET_REPOviahuggingface_hub.HfApi.restart_space, polls HF Hubdataset_metadata.jsonwithHEADERS={"Authorization":f"Bearer {HF_TOKEN}"}untilnum_samples >= dataset_lenandversion_tag != last_version, then restarts retrain space for that label. UsesDRY_RUNmode and grouped logging for GitHub Actions. -
orchestrator/.env_examples: sample env vars (HF token names, repo IDs, thresholds, poll interval/timeout, DB URL, DRY_RUN). Copy to.envfor local dev (gitignore it). -
orchestrator/requirements.txt: minimal Python libs required (SQLAlchemy, requests, huggingface_hub, python-dotenv, psycopg2-binary). Use this file in GH Actions or virtualenv. -
.github/workflows/orchestrator.yml: the scheduled/manual workflow. Exposes mapping from GitHub Secrets to orchestrator env vars and installs dependencies before running orchestrator.
Follow these steps to set up your own orchestrator from the repo.
-
Create a Hugging Face token with permissions:
readfor datasets metadatawrite+space:restartto restart HF Spaces
-
Have your DB connection details ready (Postgres recommended).
-
Optional: W&B API key for dataset/experiment logs and summary emails.
Secret names the orchestrator expects:
.env_examples
# Hugging Face Token (keep this secret; set via CI secrets)
# Format example: hf_xxxxxxxxxxxxxxxxxxxxx
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
# HF Space repository IDs (owner/name)
# Example: owner/project_prepare_dataset
PREPARE_DATASET_REPO=owner/project_prepare_dataset
RETRAIN_DEPT_REPO=owner/project_retrain_dept
RETRAIN_URGENCY_REPO=owner/project_retrain_urgency
# HF Hub metadata JSON URLs (full URL to dataset metadata)
# Example: https://huggingface.co/datasets/.../resolve/main/dataset_metadata.json
HF_HUB_METADATA_DEPT=https://huggingface.co/datasets/owner/project_department/resolve/main/dataset_metadata.json
HF_HUB_METADATA_URGENCY=https://huggingface.co/datasets/owner/project_urgency/resolve/main/dataset_metadata.json
# Database (SQLAlchemy URL)
# Format example: postgresql+psycopg2://username:password@hostname:port/database_name
DATABASE_URL=postgresql+psycopg2://demo_user:demo_pass@db.example.com:5432/project_db
# Thresholds to trigger dataset prep (integer percentage)
# Example: 40
THRESHOLD_DEPARTMENT=40
THRESHOLD_URGENCY=30
# Polling settings (seconds)
# Examples: POLL_INTERVAL=60, POL_TIMEOUT=1800
POLL_INTERVAL=60
POLL_TIMEOUT=1800
# Optional flags (true/false)
# Set DRY_RUN=true to test without triggering HF Spaces
DRY_RUN=false
Tip: Add via GitHub UI (Settings → Secrets → Actions) or script via gh secret set.
-
Copy
.env_examples→.envand fill values (do not commit.env). -
Create a virtualenv and install dependencies:
python -m venv .venv source .venv/bin/activate pip install -r orchestrator/requirements.txt -
Run in dry-run first:
DRY_RUN=true python orchestrator/orchestrator.py
- You should see computed counts and the actions it would take without restarting spaces.
-
Ensure
.github/workflows/orchestrator.ymlis present (the repo includes one). It usesworkflow_dispatchandscheduletriggers. -
The workflow maps secrets to env variables and runs:
- checkout
- setup-python
pip install -r orchestrator/requirements.txtpython orchestrator/orchestrator.py
-
Run the workflow manually (Actions → Sambodhan Orchestrator → Run workflow) to test with real secrets.
-
The Prepare Dataset Space must push a
dataset_metadata.jsonfile to the dataset repo root (path expected by the orchestrator). The file should contain:{ "dataset_name": "owner/misclassified_urgency_dataset", "version_tag": "v20251030_115250", "num_samples": 1600, "splits": {"train":1280,"eval":160,"test":160}, "created_at": "2025-10-30T11:53:01.110260+00:00" } -
Orchestrator polls the HF Hub metadata URL (authenticated) until
num_samples >= required_lenandversion_tagchanged.
-
Ensure retrain HF Spaces are configured to:
- Read the dataset version passed via env var or rely on HF Hub to fetch
latest(recommended to acceptDATASET_VERSIONor detect latest tag). - Train, evaluate, log to W&B, and (if accepted) push model + metadata and restart inference space (their internal logic is unchanged).
- Read the dataset version passed via env var or rely on HF Hub to fetch
-
Orchestrator restarts retrain spaces only after metadata confirms dataset upload.
-
GitHub Actions logs (grouped per label using
::group::):- Dataset counts (misclassified, sampled correct, total)
- Threshold decision (skipped or prepare triggered)
- Prepare space restart confirmation
- Polling updates:
current_len=X required_len=Y version=... - Retrain restart confirmation
- Final orchestration summary (JSON-like)
-
WANDB (optional):
- Prepare Space logs dataset stats to a dataset project
- Retrain Space logs full training run + deployment decision
-
Audit DB table (optional): keep
orchestrator_runstable with run metadata for historical analysis.
Before enabling the scheduled workflow for production:
- Run orchestrator locally in
DRY_RUN=true. - Create a test HF dataset repo (private/staging) and ensure Prepare Space writes
dataset_metadata.json. - Run orchestrator in staging with
DRY_RUN=falseand verify it restarts Prepare Space (logs confirm). - Simulate metadata update on HF Hub (push new metadata) and verify orchestrator detects it and restarts retrain space.
- Confirm retrain space runs, logs to W&B, and (if accepts) deploys model.
- Validate inference Space after a successful deploy (sanity test requests).
- Add alerts for polling timeouts or failed restarts (email/Slack).
-
Prepare Space completes but
dataset_metadata.jsonnot visible- Check HF token permissions in Prepare Space.
- Confirm metadata file path and naming:
resolve/main/dataset_metadata.json. - Inspect Prepare Space logs for push errors.
-
Poll timeout
- Increase
POLL_TIMEOUTif dataset generation is slow. - Inspect network errors between GH Actions runner and HF Hub (rare).
- Increase
-
Retrain space fails to start
- Inspect retrain HF Space logs (Docker build errors, missing deps).
- Confirm retrain repo ID and HF token.
-
Duplicate retrain triggers / race
- Ensure orchestrator stores
last_versionbefore triggering Prepare Space, and useversion_tagcomparison to prevent retraining on the same version. - Optionally serialize runs with a simple lock table.
- Ensure orchestrator stores
- To run immediately: GitHub Actions → Sambodhan Orchestrator → Run workflow (manual).
- To test changes: set
DRY_RUN=truein repo secrets temporarily. - If dataset didn’t publish: open Prepare Space logs → fix dataset push → re-run orchestrator.
- If retrain failed: open Retrain Space logs and W&B run.
- Use separate HF tokens for spaces (least privilege): Prepare Space token only needs dataset repo write; orchestrator token needs
space:restart. - Per-label thresholds (department / urgency) : tune individually.
- Persist orchestrator run history for auditing and debugging.
- Use short-lived tokens and rotate regularly.
- Test end-to-end in staging before enabling production cron.