Document per-user rate limiting and concurrency limiting in production guide

mvdbeek · mvdbeek · commit 8ea70f4119b0 · 2026-04-07T14:47:36.000+02:00
Add comprehensive admin documentation covering both celery task
throttling mechanisms:

- Per-user rate limiting: configuration, two-phase slot reservation
  design (reserve once, retry until timeslot), DB backend differences
  (Postgres atomic upsert vs standard SELECT FOR UPDATE), and
  limitations (clock precision, no priority ordering, slot consumption
  on failure).
- Per-user concurrency limiting: configuration, admission control flow,
  after_return cleanup, periodic stale-row recovery via worker
  inspection, and limitations (retry polling interval, crash recovery
  window, DB overhead at scale).
- Combined usage: explains that rate limiting runs first (timeslot
  scheduling) then concurrency limiting (execution gating).
- Administrative operations: SQL recipes for clearing leaked slots and
  celery CLI commands for purging/revoking tasks.
diff --git a/doc/source/admin/production.md b/doc/source/admin/production.md
@@ -262,3 +262,166 @@ This configuration ensures that:
 2. Exported files remain available for 7 days after generation
 
 You can monitor Celery task status using [Flower](https://flower.readthedocs.io/en/latest/), a real-time web-based monitoring tool for Celery.
+
+#### Per-user task rate limiting
+
+Galaxy supports limiting the rate at which Celery tasks are executed per user. This prevents a single user from monopolizing worker capacity and ensures fair scheduling across all users.
+
+##### Configuration
+
+Set `celery_user_rate_limit` in the Galaxy configuration to a non-zero float representing the maximum number of tasks per user per second. For example:
+
+```yaml
+celery_user_rate_limit: 0.1
+```
+
+This allows each user at most one task execution every 10 seconds. The default value of `0.0` disables rate limiting entirely.
+
+##### How it works
+
+Rate limiting is implemented in Celery's `before_start` hook via the `GalaxyTaskBeforeStart` class hierarchy. The mechanism works in two phases:
+
+1. **Slot reservation (first attempt):** When a task is about to execute for the first time, Galaxy atomically reserves the next available timeslot for that user in the `celery_user_rate_limit` database table. The reserved time is calculated as `max(last_scheduled_time + interval, now)`. If the reserved timeslot is in the future, the task is deferred using `task.retry(countdown=...)` and the reserved time is stored in a Celery message header.
+
+2. **Execution gating (retries):** On each subsequent retry, the task reads its reserved timeslot from the message header and checks whether the current time has reached it. If not, it retries again with the remaining countdown. Once the timeslot is reached, the task proceeds to execute.
+
+This two-phase design ensures that:
+- Each task reserves a slot in the DB exactly once, avoiding cascading delays from re-reservation.
+- Tasks from different users are scheduled independently — user A's tasks do not delay user B's tasks.
+- Tasks are retried with `max_retries=None` for rate-limit retries, so they are never lost due to Celery's default retry limit.
+
+The flow for a single task looks like this:
+
+```
+Task arrives (retries=0, no header)
+  → Atomically reserve timeslot in DB
+  → Timeslot is now? Execute immediately.
+  → Timeslot is in the future?
+      → Store timeslot in message header
+      → task.retry(countdown=timeslot - now)
+
+Task wakes up (retry, header present)
+  → Read reserved timeslot from header
+  → now >= timeslot? Execute.
+  → now < timeslot? retry(countdown=remaining)
+```
+
+##### Database backends
+
+Galaxy provides two implementations of the timeslot reservation logic:
+
+- **PostgreSQL** (`GalaxyTaskBeforeStartUserRateLimitPostgres`): Uses `UPDATE ... RETURNING` with `greatest()` for an atomic, single-statement slot reservation. Falls back to `INSERT ... ON CONFLICT DO UPDATE` (upsert) for the first task by a given user. This is the most efficient implementation.
+
+- **Standard SQL** (`GalaxyTaskBeforeStartUserRateLimitStandard`): Uses `SELECT ... FOR UPDATE` followed by a separate `UPDATE` (or `INSERT` for new users). This works with SQLite and other databases but requires two statements and explicit locking.
+
+The correct implementation is selected automatically based on the configured `database_connection`.
+
+##### Limitations
+
+- **Rate limiting, not concurrency limiting.** The mechanism controls the *rate* at which tasks are scheduled (tasks per second per user), not how many tasks run concurrently. If a user submits 100 tasks, they will all eventually execute — just spaced apart by the configured interval.
+- **Tasks without `task_user_id` are not rate-limited.** Only tasks that receive a `task_user_id` keyword argument participate in rate limiting. System tasks and tasks without a user context bypass the check entirely.
+- **Timeslots are not released on failure.** If a task fails after its timeslot was reserved, that slot is consumed. The next task for the same user will be scheduled after it. This means task failures still "use up" rate-limit capacity.
+- **Clock precision.** The mechanism relies on `datetime.datetime.now()` on the worker. Clock skew between workers could cause minor scheduling inaccuracies, though this is unlikely to matter in practice.
+- **No priority or reordering.** Tasks are scheduled in the order they reserve slots (first-come, first-served within a user). There is no mechanism to prioritize certain task types over others for the same user.
+- **Worker restarts.** If a worker is terminated while holding deferred tasks, Celery's broker (e.g., Redis, RabbitMQ) will redeliver them. The tasks will re-enter the `before_start` hook, read their reserved timeslot from the message header, and continue waiting or execute as appropriate — no slots are lost or duplicated.
+- **Database overhead.** Each task execution requires one or two queries to the `celery_user_rate_limit` table to reserve a timeslot (one for PostgreSQL's atomic upsert, two for the standard `SELECT FOR UPDATE` + `UPDATE` path). On PostgreSQL at 100 tasks/second this adds ~100 small writes/second; the standard backend doubles that. For most Galaxy deployments (typically fewer than 10 tasks/second) this is negligible. Additionally, tasks that are deferred via `task.retry` re-enter the broker and are redelivered to a worker, adding a small amount of broker traffic proportional to the deferral rate.
+
+#### Per-user task concurrency limiting
+
+In addition to rate limiting, Galaxy supports limiting the number of tasks that can execute **concurrently** for a single user. This prevents one user from consuming all available worker capacity.
+
+##### Configuration
+
+Set `celery_user_concurrency_limit` in the Galaxy configuration to the maximum number of tasks that can run simultaneously per user:
+
+```yaml
+celery_user_concurrency_limit: 5
+```
+
+The default value of `0` disables concurrency limiting. This setting can be used independently of or in combination with `celery_user_rate_limit`.
+
+##### How it works
+
+Concurrency limiting uses a tracking table (`celery_user_active_task`) that records which tasks are currently executing for each user. The mechanism has three components:
+
+1. **Admission control (`before_start`):** Before a task executes, the system counts the user's currently active tasks in the tracking table. If the count is at or above the limit, the task is deferred via `task.retry(countdown=5)` with unlimited retries. Otherwise, a tracking row is inserted for this task.
+
+2. **Cleanup on completion (`after_return`):** When a task finishes (success or failure), its tracking row is deleted from the table. This runs via Celery's `after_return` hook, which fires regardless of whether the task succeeded or failed. Retries do not trigger cleanup — only final completion does.
+
+3. **Stale row recovery (periodic beat task):** A periodic task (`cleanup_stale_concurrency_slots`) runs every 5 minutes to handle the case where a worker crashes without calling `after_return`. It queries all workers via `celery_app.control.inspect().active()` to get the set of actually-running task IDs, then removes any tracking rows older than 30 minutes whose task ID is not found on any worker.
+
+The flow for a single task:
+
+```
+Task arrives
+  → Count active tasks for this user in DB
+  → Count >= limit? → task.retry(countdown=5)
+  → Count < limit?
+      → INSERT tracking row (task_id, user_id, started_at)
+      → Execute task
+      → Task finishes (success or failure)
+          → DELETE tracking row
+
+Periodic cleanup (every 5 min)
+  → SELECT stale rows (started_at > 30 min ago)
+  → inspect().active() → get all running task IDs from workers
+  → DELETE rows where task_id NOT in active set
+```
+
+##### Combining rate limiting and concurrency limiting
+
+When both `celery_user_rate_limit` and `celery_user_concurrency_limit` are set, rate limiting runs first (to schedule the timeslot) and concurrency limiting runs second (to gate execution based on active task count). This means a task must both reach its scheduled timeslot *and* have a concurrency slot available before it can execute.
+
+##### Limitations
+
+- **Tasks without `task_user_id` are not limited.** Only tasks that receive a `task_user_id` keyword argument participate in concurrency limiting.
+- **Worker crash recovery is not instant.** If a worker is killed (SIGKILL, OOM), its tracking rows remain until the periodic cleanup task runs (every 5 minutes by default). During this window, those slots are "leaked" and reduce the user's effective concurrency limit.
+- **Retry polling interval is fixed.** Deferred tasks retry every 5 seconds. Under heavy load with many deferred tasks, this creates periodic bursts of retry attempts.
+- **No queue ordering guarantees.** When multiple tasks are waiting for a concurrency slot, the order in which they acquire slots depends on Celery's delivery order and retry timing — not submission order.
+- **Database overhead.** Each task execution requires an INSERT (on start) and DELETE (on completion) in the tracking table, plus a COUNT query for admission. At 100 tasks/second this adds ~300 small queries/second to the database. For most Galaxy deployments (which typically sustain fewer than 10 tasks/second) this is negligible. Deployments processing hundreds of tasks per second should monitor database connection pool utilization and query latency on the `celery_user_active_task` table.
+
+##### Administrative operations
+
+Admins can directly manage the concurrency tracking table and the Celery queue to recover from stuck states or clear backlogs.
+
+**Clearing leaked concurrency slots manually:**
+
+If a worker crashes and the periodic cleanup hasn't run yet (or Celery beat is not running), admins can free slots directly:
+
+```sql
+-- View all currently tracked active tasks
+SELECT * FROM celery_user_active_task ORDER BY started_at;
+
+-- Remove all slots for a specific user (e.g., user_id 42)
+DELETE FROM celery_user_active_task WHERE user_id = 42;
+
+-- Remove all stale slots older than 1 hour
+DELETE FROM celery_user_active_task
+WHERE started_at < NOW() - INTERVAL '1 hour';
+
+-- Nuclear option: clear ALL tracking rows (resets all concurrency counters)
+DELETE FROM celery_user_active_task;
+```
+
+After clearing rows, deferred tasks waiting for slots will acquire them on their next retry (within 5 seconds).
+
+**Purging tasks from the Celery queue:**
+
+To remove pending (not yet started) tasks from the broker queue:
+
+```bash
+# Purge all pending tasks from the default Galaxy queue
+celery -A galaxy.celery purge -Q galaxy.internal
+
+# Purge all pending tasks from all queues
+celery -A galaxy.celery purge
+
+# Revoke a specific task by ID (prevents it from executing even if already delivered)
+celery -A galaxy.celery call celery.backend_cleanup  # or use the control interface:
+celery -A galaxy.celery control revoke <task-id>
+
+# Revoke all pending tasks for inspection first
+celery -A galaxy.celery inspect reserved
+```
+
+Note: `purge` only removes tasks that have not yet been delivered to a worker. Tasks already being executed or waiting in a worker's prefetch buffer require `revoke`. Revoking a task that is mid-execution requires the `--terminate` flag, which sends SIGTERM to the worker process — use with caution.