Fishtest is a distributed chess engine testing infrastructure. The server:
- Accepts test submissions from developers (new Stockfish patches).
- Assigns work units (tasks) to volunteer worker machines.
- Collects game results and computes statistical tests (SPRT, ELO).
- Publishes results through a web dashboard and a JSON API.
A single MongoDB instance is the system of record. All run state, user accounts, action logs, and neural network metadata are stored there.
server/
|-- pyproject.toml -- Package metadata, dependencies
|-- fishtest/
| |-- app.py -- ASGI application factory, lifespan, middleware, routers
| |-- api.py -- Worker API router (20 endpoints)
| |-- views.py -- UI router (33 routes, data-driven dispatch,
| | routing hub)
| |-- views_helpers.py -- Pure stateless helpers extracted from views.py
| |-- views_actions.py -- Actions-page helpers (row building, sorting, query strings)
| |-- views_finished.py -- Finished-runs page helpers (pagination, filtering)
| |-- views_machines.py -- Machines-page helpers (normalization, filter state)
| |-- views_run.py -- Run creation/modification helpers (validation, lifecycle)
| |-- rundb.py -- RunDb: run lifecycle, task distribution, caching
| |-- userdb.py -- UserDb: authentication, groups, registration
| |-- actiondb.py -- ActionDb: audit log
| |-- workerdb.py -- WorkerDb: worker blocking
| |-- kvstore.py -- KVStore: key-value metadata (legacy usernames, flags)
| |-- scheduler.py -- Periodic task scheduler (primary instance only)
| |-- schemas.py -- vtjson validation schemas
| |-- run_cache.py -- In-memory run cache with dirty-page flush
| |-- lru_cache.py -- Generic LRU cache
| |-- spsa_workflow.py -- Pure classic SPSA lifecycle helpers
| |-- spsa_handler.py -- SPSA worker orchestration, request/update flow, history buffering
| |-- github_api.py -- GitHub integration (commit metadata, branch resolution)
| |-- util.py -- Shared utilities (formatting, validation helpers)
| |-- __init__.py -- Minimal package init
| |-- http/ -- HTTP support modules
| |-- templates/ -- Jinja2 templates (53 files, .html.j2)
| |-- static/ -- Static assets (JS, CSS, images)
| `-- stats/ -- Statistical computation modules
`-- tests/ -- Focused unit and HTTP contract tests
views.py remains the stable UI routing hub. The extracted views_*.py modules
hold domain logic, but route registration, _dispatch_view, and the request
shim stay centralized there. Each extracted views_*.py module keeps a matching
dedicated test file under server/tests/. User-facing route-family UI tests
reuse ui_user_test_case.py and stay grouped by route family or one focused
UI motif.
Classic SPSA server ownership is split deliberately. spsa_workflow.py holds
the pure classic SPSA helpers reused by the run form, the detail page, and the
worker lifecycle. spsa_handler.py stays attached to RunDb and owns the
stateful worker request/update path, flip packing, buffering, and history
timing.
http/
|-- __init__.py -- Package init
|-- boundary.py -- API request adapter (ApiRequestShim), session commit,
| template context builder (build_template_context)
|-- cookie_session.py -- CookieSession class, secret key management, session helpers
|-- csrf.py -- CSRF token generation and validation
|-- dependencies.py -- FastAPI dependency functions (get_rundb, get_userdb, etc.)
|-- errors.py -- Centralized error handler installation (API/UI routing)
|-- jinja.py -- Jinja2 Environment, Jinja2Templates instance, static_url
|-- middleware.py -- Pure ASGI middleware (5 middleware classes)
|-- session_middleware.py -- FishtestSessionMiddleware (itsdangerous cookie signing)
|-- settings.py -- AppSettings (environment variable parsing)
|-- template_helpers.py -- Jinja2 filters and global functions
|-- template_renderer.py -- Template rendering helper (render_template_to_response)
|-- ui_errors.py -- HTML error page rendering (404, 403)
`-- ui_pipeline.py -- HTTP cache header application
stats/
|-- __init__.py
|-- LLRcalc.py -- Log-likelihood ratio computation
|-- brownian.py -- Brownian motion model for SPRT
|-- sprt.py -- Sequential probability ratio test
`-- stat_util.py -- ELO calculation, SPRT_elo, get_elo
The entrypoint is uvicorn fishtest.app:app. The create_app() function in
app.py builds the FastAPI instance with a lifespan context manager that
handles startup and shutdown. OpenAPI docs (/docs, /redoc) are disabled
in production (openapi_url defaults to None). Set
OPENAPI_URL=/openapi.json to enable the full interactive API documentation
during development.
The server uses Uvicorn (ASGI) with an async event loop. The event loop
accepts and dispatches HTTP connections; all blocking work runs in the
Starlette threadpool via run_in_threadpool().
| Aspect | Description |
|---|---|
| Concurrency model | Async event loop + threadpool (200 tokens, configurable) |
| Connection capacity | 9,400+ concurrent workers proven in production |
| Blocking I/O | Occupies a threadpool slot only during the blocking call |
| Memory per connection | Coroutine frame (~KBs), not a full thread stack |
| Overload behavior | Connections queue in the kernel backlog (--backlog 16384) |
All blocking work (MongoDB queries, file I/O, CPU-bound stats, GitHub API calls) is offloaded to the threadpool. The event loop stays free to accept new connections and dispatch lightweight work (session cookie signing, JSON parsing, CSRF checks).
Application-level throttling (task_semaphore(TASK_SEMAPHORE_SIZE) +
request_task_lock in rundb.py) governs the scheduling critical path.
Both THREADPOOL_TOKENS and TASK_SEMAPHORE_SIZE are defined in
http/settings.py; see 2-threading-model.md for
the full analysis. Do not use Uvicorn's
--limit-concurrency flag -- it rejects excess connections with HTTP 503
instead of queuing them, which triggers exponential backoff in workers
(see 8-deployment.md for details).
AppSettings.from_env()reads environment variables (FISHTEST_PORT,FISHTEST_PRIMARY_PORT).- On the primary instance,
_require_single_worker_on_primary()enforces thatUVICORN_WORKERSis 1 (prevents duplicated scheduler side effects). RunDb(port, is_primary_instance)is constructed in the threadpool. This connects to MongoDB and initializes all domain adapters (UserDb, ActionDb, WorkerDb, KVStore).- Domain adapters are stored on
app.statefor request-scoped access:app.state.rundb,app.state.userdb,app.state.actiondb,app.state.workerdb. schemas.legacy_usernamesis populated from KVStore.- On the primary instance only:
gh.init()initializes the GitHub API client.rundb.update_aggregated_data()refreshes cached statistics.rundb.schedule_tasks()starts the periodic scheduler.
rundb._shutdown = True-- signals middleware to reject new requests.asyncio.sleep(0.5)-- brief drain period.- Scheduler is stopped (
rundb.scheduler.stop()). - On primary: run cache is flushed, persistent data is saved.
- A
system_eventaction is logged. - MongoDB connection is closed.
Middleware is installed in create_app() and executes in reverse installation
order (outermost first in the request path):
| Order | Middleware | Responsibility |
|---|---|---|
| 1 | FishtestSessionMiddleware |
Reads/writes signed session cookie (itsdangerous) |
| 2 | RedirectBlockedUiUsersMiddleware |
Redirects blocked users to /tests (302) |
| 3 | RejectNonPrimaryWorkerApiMiddleware |
Returns 503 for worker API on non-primary instances |
| 4 | AttachRequestStateMiddleware |
Copies app.state handles to request.state; stamps request_started_at |
| 5 | ShutdownGuardMiddleware |
Returns 503 for all requests during shutdown |
| 6 | HeadMethodMiddleware |
Converts HEAD to GET and strips response body (RFC 9110 Section 9.3.2) |
All middleware classes are pure ASGI (__call__(self, scope, receive, send)).
None use Starlette's BaseHTTPMiddleware.
High-level request path:
flowchart LR
client[Client] --> nginx[nginx]
nginx --> uvicorn[Uvicorn]
uvicorn --> middleware[ASGI middleware stack]
middleware --> router[FastAPI router]
router -->|HTML pages and fragments| ui[views_router]
router -->|Worker and user API| api[api_router]
router -->|/static| static[StaticFiles]
- Worker API:
api_routerhandles all/api/*endpoints. Worker endpoints require authentication viausername/passwordin the POST body. - UI:
views_routerhandles all HTML-rendering endpoints. Routes are registered from the_VIEW_ROUTEStable via_register_view_routes(). - Static assets:
StaticFilesmount serves/static/*.
UI templates load htmx 2.0.10 from CDN in base.html.j2. The server remains
fully server-rendered (Jinja2 + HTML responses). The test detail page also
loads a page-scoped diff renderer for the inline Diff panel. htmx adds three
capabilities without client-side rendering or a JavaScript build step:
| Capability | Mechanism |
|---|---|
| Fragment polling | hx-get + hx-trigger="every Ns" fetches a fragment endpoint; server returns partial HTML |
| In-place content swap | hx-get + hx-target + hx-swap="innerHTML" replaces a page section (filters, pagination) |
| Out-of-band updates | hx-swap-oob="innerHTML" attributes in the response update multiple DOM elements in one response |
Dual-mode endpoints. Several UI routes serve either a full page or an HTML
fragment from the same URL. The view handler calls _is_hx_request(request) to
detect the HX-Request: true header (with a Sec-Fetch-Mode guard against
full-page navigations), then returns the appropriate template via the
_render_hx_fragment() helper. _dispatch_view() appends Vary: HX-Request
to every GET response so that HTTP caches distinguish the two representations.
UI GET responses also emit Cache-Control: the default is
no-cache, private, auth-sensitive pages use no-store, and explicit
route-level overrides such as /tests/machines can still set a short
max-age. This keeps dynamic cache policy server-authoritative and prevents
shared caches such as nginx proxy_cache from storing personalized UI
responses.
Reverse-proxy cache boundary. nginx should respect those application
headers for dynamic content and should not add proxy_cache in front of UI
routes. Aggressive proxy/browser caching remains appropriate for immutable
static assets only.
Server-authoritative table state. The htmx list pages (/nns,
/contributors, /user_management, /workers/show, /tests/machines) keep
sort, search, page, and view state in the URL and render the active control
state server-side on every response. Where the htmx target contains stateful
controls, the swapped boundary is the full content fragment, not just table
rows. The shared active-search debounce is projected into templates as
htmx.input_changed_delay_ms.
Request coordination. Poll-driven fragments that live inside a larger filter
form may use hx-sync, hx-disinherit, and hx-params to keep inherited form
state from corrupting sort/pagination links and to ensure explicit user actions
win over timer-driven refreshes.
Fragment templates. Fragment responses use standalone .html.j2 files
(named *_fragment.html.j2) that do not extend base.html.j2. This avoids
the need for block-level partial rendering and keeps fragments self-contained.
See 5-templates.md for the full catalog.
OOB table rows. HTML spec restrictions prevent <tbody> elements from
appearing inside <div>. Fragment templates that update table bodies wrap
<tbody> elements in <template> tags with hx-swap-oob attributes.
htmx processes the <template> content and discards the wrapper.
Polling lifecycle. Polled endpoints use HTTP status codes to control the polling lifecycle:
- 200 -- swap the response content.
- 204 -- no content; htmx skips the swap but continues polling.
- 286 -- swap the response and stop polling (terminal state).
The test detail page uses one visibility-aware OOB poller for live summary and detail data:
/tests/view/{id}/detailrefreshes the ELO block, run status, active-worker totals, detail table, time block, compact chi-square block, and the embedded SPSA chart payload.
The merged detail poller uses hx-swap="none", so the poller element stays
stable while htmx still applies the response's out-of-band section updates.
The tasks table keeps its own conditional /tests/tasks/{id} poller because it
has a separate shell/body + OOB-controls contract.
Run-list and detail polling intentionally use different data shapes. The
/tests and /tests/user/{username} run-table path rebuilds from
aggregate_unfinished_runs() and the lightweight unfinished-run query, which
omits tasks, bad_tasks, and args.spsa.param_history. Detail routes use
full run data via get_run() and the dedicated tasks poller.
Visibility-aware polling policy. Every periodic htmx poller follows a three-part trigger policy:
- A periodic trigger gated on
document.visibilityState === 'visible'. - An immediate focus-return trigger using
visibilitychange[document.visibilityState === 'visible'] from:document. - Section-scoped pollers (machines, tasks) additionally gate on the
section's expanded state (
classList.contains('show')).
This ensures background tabs do not generate server load and that returning to the tab produces an immediate refresh.
Multiple Uvicorn instances run behind nginx (ports 8000-8003). Exactly one is
designated the primary via the FISHTEST_PRIMARY_PORT environment variable.
- Periodic scheduler (run cleanup, ELO recalculation).
- Aggregated data updates.
- GitHub API integration.
- Run cache flush and persistent data save on shutdown.
- Serve UI traffic only.
- Worker API requests return 503 (via
RejectNonPrimaryWorkerApiMiddleware). - nginx routes worker API traffic to the primary; UI traffic is distributed across all instances.
| Signal | Behavior |
|---|---|
| SIGINT / SIGTERM | Uvicorn initiates graceful shutdown -> lifespan cleanup runs |
| SIGUSR1 | Dumps all thread stacks to stderr via faulthandler.register() |
To trigger a thread dump on a systemd-managed instance, run sudo systemctl kill -s SIGUSR1 fishtest@8000.
During shutdown, ShutdownGuardMiddleware rejects new requests with HTTP 503.
These are not HTTP modules. They encapsulate business logic and MongoDB access.
| Adapter | Module | Responsibility |
|---|---|---|
RunDb |
rundb.py |
Run lifecycle, task assignment, result aggregation, run cache |
UserDb |
userdb.py |
User CRUD, password hashing (zxcvbn strength), group membership |
ActionDb |
actiondb.py |
Audit trail for user and system actions |
WorkerDb |
workerdb.py |
Worker ban list management |
KVStore |
kvstore.py |
Lightweight key-value pairs in MongoDB |
Scheduler |
scheduler.py |
Periodic background tasks on primary instance |
A single RunDb instance is created per process at startup and stored on
app.state.rundb. It owns all other adapters (rundb.userdb, rundb.actiondb,
rundb.workerdb, rundb.kvstore).
vtjson is the sole validation layer. The schemas.py module defines the
repository's vtjson schemas for plain Python dict validation. Schemas are used
in:
- API endpoints (request body validation).
- Domain adapters (run, user, action document validation before MongoDB writes).
- Form input validation (username format, worker name format).
When raw form input and persisted document data intentionally have different contracts, fishtest uses different vtjson schemas for those boundaries. Raw-input schemas may be broader than the persisted-data schema, while the persisted schema describes the canonical stored form validated before MongoDB writes.
For contributor-facing vtjson rules and schema-change guidance, see 7-development.md.
No Pydantic models are used anywhere in the codebase.
This project uses FastAPI as a thin routing convenience layer on top of Starlette. The three FastAPI-exclusive features in use are:
FastAPI()-- the application class (inheritsstarlette.Starlette).APIRouter-- decorator-style route registration and data-drivenadd_api_route().- Exception handlers -- two fallback handlers from
fastapi.exception_handlers.
The following FastAPI features are not used:
- Pydantic request/response models (
BaseModel,response_model). - Dependency injection (
Depends()) in route signatures. - Parameter declarations (
Body,Query,Path,Header,Cookie). - Security schemes (
OAuth2,HTTPBasic,APIKey).
All middleware is pure ASGI (Starlette pattern). Session handling, CSRF protection, authentication, and request validation use custom implementations -- not FastAPI's built-in machinery.
Contributors should not expect Pydantic, DI, or security scheme patterns
in this codebase. When importing classes that FastAPI re-exports from
Starlette (Request, Response, JSONResponse, StaticFiles, etc.),
prefer importing from starlette directly.
Error handlers are installed via install_error_handlers(app) in app.py.
They route errors differently based on the request path:
| Path prefix | HTTP 404 | HTTP 401/403 | Validation error | Unhandled exception |
|---|---|---|---|---|
/api/* (worker) |
JSON {"error": "...", "duration": N} |
JSON | JSON {"error": "...", "duration": N} |
JSON {"error": "...", "duration": N} |
/api/* (other) |
JSON {"detail": "Not Found"} |
JSON | JSON | JSON |
| UI routes | HTML 404 page (Jinja2) | HTML 403 page (Jinja2) | Default | Plain text 500 |
Worker API errors always include a duration field to maintain protocol
compatibility.