Skip to content

Commit 46bfae9

Browse files
committed
Tooling: CPU-weight-aware check scheduling + --graph
- Add `CpuWeight` to checks; the runner admits by summed weight (<= NumCPU) instead of a flat count, so CPU-heavy checks (svelte-tests, cold clippy, integration-tests) stop piling up and oversubscribing the machine - the contention behind the slow-lane E2E/Rust timeout flakes. - Weights calibrated from an isolation sweep (`docs/notes/check-cpu-contention.md`). Key finding: the longest checks (eslint-typecheck, e2e-linux, rust-tests-linux) are light-CPU and now backbone the schedule. - Add `./scripts/check.sh --graph` (tree/mermaid/dot) to visualize the dependency forest with weights and size lanes; standalone checks list compactly, connected ones as trees.
1 parent a190f19 commit 46bfae9

8 files changed

Lines changed: 568 additions & 49 deletions

File tree

docs/notes/check-cpu-contention.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Check CPU contention sweep + weights
2+
3+
Why some `--include-slow` runs flaked (E2E modal/popover timeouts, an 8s-cap Rust test) traced back to the check runner
4+
firing up to `NumCPU` checks at once, each itself multi-threaded, so the short CPU-heavy checks piled on top of each
5+
other and oversubscribed the machine 2-3×. This note records the per-check CPU measurement that calibrated
6+
`CheckDefinition.CpuWeight` and the weighted admission gate in `runner.go`.
7+
8+
## Methodology
9+
10+
Each non-fast check run in isolation (`./scripts/check.sh --check <id>`), 16-core macOS, two metrics:
11+
12+
- **`proc_cores`** = `(user+sys CPU seconds) / wall seconds` from `/usr/bin/time -l` on the check process tree. Native
13+
host CPU the check burned.
14+
- **`sys_cores`** (avg + peak) = system-wide cores busy from `iostat` (`(100-idle)/100 × ncpu`), sampled each second.
15+
Catches CPU burned **inside the Docker/OrbStack VM** that the host process never shows.
16+
17+
The gap between them is the tell: a check with `proc_cores ≈ 0` but high `sys_cores` does all its work in a container
18+
(`rust-tests-linux`, `e2e-linux`). Background baseline was ~2 cores (this session + iostat itself).
19+
20+
## Results (cold/working profile where it matters)
21+
22+
| check | wall | proc_cores | sys avg | sys peak | note |
23+
| --------------------------------------------------------------------------------------------------------- | ----: | ---------: | ------: | -------: | ------------------------------------------------- |
24+
| svelte-tests | 40s | 10.7 | 14.2 | 16 | vitest workers — heaviest per-second |
25+
| integration-tests | 109s | 7.4 | 9.9 | 16 | native compile + SMB Docker |
26+
| nilaway | 11s | 6.5 | 8.5 | 14 | heavy but short |
27+
| website-e2e | 10s | 6.0 | 9.7 | 16 | chromium, short |
28+
| cargo-udeps (CI) | 96s | 4.6 | 7.9 | 15 | nightly full compile |
29+
| rust-tests | 31s | 3.5 | 6.4 | 13 | warm-incremental; cold heavier |
30+
| rust-tests-linux | 191s | 0.01 | 6.2 | 16 | **all in Docker-VM** |
31+
| deadcode | 4s | 4.4 | 5.6 | 6 | short |
32+
| e2e-playwright | 183s | 3.1 | 5.0 | 16 | builds app + N shards |
33+
| govulncheck | 5s | 3.4 | 4.4 | 5 | short |
34+
| jscpd | 8s | 1.2 | 5.2 | 11 | bursty |
35+
| e2e-linux | 285s | 0.01 | 4.7 | 16 | Docker; mostly wait + build spikes |
36+
| eslint-typecheck | 871s | 1.1 | 5.0 | 16 | **longest, but ~1 core avg** (single-threaded TS) |
37+
| clippy | 24s | 0.9 | 3.9 | 11 | **warm cache** — cold ≈ all cores |
38+
| bindings-fresh | 4s | 1.0 | 7.1 | 8 | **cache HIT** — regen compiles the crate |
39+
| cargo-audit/deny, svelte-check, svelte-eslint, website-{build,typecheck,eslint}, api-eslint, docker-build | 2-10s | 0.1-1.8 | 1.6-3.6 || light / single-threaded |
40+
41+
### Caveats (don't read three rows literally)
42+
43+
- `clippy` was measured warm (incremental); **cold it is a fully-parallel compile pinning ~all cores** → treated as
44+
weight 8.
45+
- `bindings-fresh` was a cache hit; when it actually regenerates it compiles the crate in test mode → weight 8.
46+
- `rust-tests` warm-incremental; cold is heavier → weight 6.
47+
48+
## Headline finding: longest ≠ heaviest
49+
50+
The wall-clock-dominating checks barely use CPU on average:
51+
52+
- `eslint-typecheck` (14.5 min) runs on ~1 core (TypeScript project service is single-threaded).
53+
- `e2e-linux` / `rust-tests-linux` burn ~0 host CPU — work is in the Docker VM, mostly wait + short build spikes.
54+
55+
So they make ideal **backbone fillers**: occupy wall-clock cheaply while the short CPU-heavy checks (`svelte-tests`,
56+
`integration-tests`, `clippy`-cold, `cargo-udeps`, `nilaway`) run packed underneath them.
57+
58+
## Weights assigned
59+
60+
`CpuWeight` ≈ the check's working-profile average busy cores, rounded, Docker-VM-aware:
61+
62+
| weight | checks |
63+
| -----: | -------------------------------------------------------------------------------------------------------- |
64+
| 11 | svelte-tests |
65+
| 8 | clippy, cargo-udeps, bindings-fresh, integration-tests |
66+
| 7 | nilaway |
67+
| 6 | rust-tests, rust-tests-linux, website-e2e |
68+
| 4 | deadcode, e2e-linux, e2e-playwright |
69+
| 3 | govulncheck |
70+
| 2 | jscpd, svelte-eslint, eslint-typecheck, svelte-check, website-{typecheck,build,docker-build}, api-eslint |
71+
| 1 | cargo-audit, cargo-deny, website-eslint, everything unset (fast formatters/scanners) |
72+
73+
The runner admits a check only when `sum(running weights) + weight ≤ NumCPU` (a weight-0/unset check counts as 1; an
74+
over-budget check runs alone). Net effect: wall-clock stays bounded by the critical path (`eslint-typecheck` under
75+
`--include-slow`, `clippy`-cold for the default suite) while peak oversubscription drops from ~2-3× to ~1×.
76+
Fast/unmeasured checks default to 1 and can be recalibrated later if the fast lane ever shows contention.
77+
78+
Render the graph with weights + lanes: `./scripts/check.sh --graph` (also `--graph-format mermaid|dot`).

scripts/check/CLAUDE.md

Lines changed: 47 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -42,22 +42,29 @@ go run ./scripts/check --only-freestyle
4242

4343
## Command-line options
4444

45-
| Option | Description |
46-
| --------------------------- | ------------------------------------------------------- |
47-
| `--app NAME` | Run checks for a specific app |
48-
| `--rust`, `--rust-only` | Run only Rust checks (desktop) |
49-
| `--svelte`, `--svelte-only` | Run only Svelte checks (desktop) |
50-
| `--check ID` | Run specific checks by ID or nickname (repeatable) |
51-
| `--ci` | Disable auto-fixing (for CI) |
52-
| `--verbose` | Show detailed output |
53-
| `--include-slow` | Include slow checks (excluded by default) |
54-
| `--only-slow` | Run only slow checks |
55-
| `--fast` | Run only the curated fast pre-commit check set |
56-
| `--only-freestyle` | Run freestyle-compatible checks on a VM (skip the rest) |
57-
| `--prefer-freestyle` | Run compat checks on VM + the rest locally in parallel |
58-
| `--fail-fast` | Stop on first failure |
59-
| `--no-log` | Disable CSV stats logging |
60-
| `-h`, `--help` | Show help message |
45+
| Option | Description |
46+
| --------------------------- | ------------------------------------------------------------------ |
47+
| `--app NAME` | Run checks for a specific app |
48+
| `--rust`, `--rust-only` | Run only Rust checks (desktop) |
49+
| `--svelte`, `--svelte-only` | Run only Svelte checks (desktop) |
50+
| `--check ID` | Run specific checks by ID or nickname (repeatable) |
51+
| `--ci` | Disable auto-fixing (for CI) |
52+
| `--verbose` | Show detailed output |
53+
| `--include-slow` | Include slow checks (excluded by default) |
54+
| `--only-slow` | Run only slow checks |
55+
| `--fast` | Run only the curated fast pre-commit check set |
56+
| `--only-freestyle` | Run freestyle-compatible checks on a VM (skip the rest) |
57+
| `--prefer-freestyle` | Run compat checks on VM + the rest locally in parallel |
58+
| `--fail-fast` | Stop on first failure |
59+
| `--no-log` | Disable CSV stats logging |
60+
| `--graph` | Render the check dependency graph (weights + lanes) and exit |
61+
| `--graph-format` | Graph output: `tree` (default, colored terminal), `mermaid`, `dot` |
62+
| `-h`, `--help` | Show help message |
63+
64+
`--graph` honors the same selection flags (`--rust`, `--svelte`, `--app`, `--check`), so `--graph --rust` graphs only
65+
the Rust checks. It renders before the slow/fast/CI filters, so every lane shows with its size badge. `mermaid` output
66+
pastes into a Markdown ```mermaid block or https://mermaid.live; `dot` pipes to Graphviz (`./scripts/check.sh --graph
67+
--graph-format dot | dot -Tpng -o checks.png`).
6168

6269
## Architecture
6370

@@ -94,7 +101,8 @@ go run ./scripts/check --only-freestyle
94101
| File | Purpose |
95102
| --------------------- | ---------------------------------------------------------------------------------------------------- |
96103
| `main.go` | Entry point: flag parsing, root dir discovery, check selection, pnpm gating, runner delegation |
97-
| `runner.go` | Parallel executor: goroutine pool, dependency graph, fail-fast, live TTY status line |
104+
| `runner.go` | Parallel executor: CPU-weighted admission gate, dependency graph, fail-fast, live TTY status line |
105+
| `graph.go` | `--graph` renderer: dependency forest with CPU weights + size lanes (tree / mermaid / dot) |
98106
| `stats.go` | CSV stats logging (`logCheckStats`): appends one row per check to `~/cmdr-check-log.csv` |
99107
| `colors.go` | ANSI color constants |
100108
| `utils.go` | `findRootDir()` (walks up until `apps/desktop/src-tauri/Cargo.toml` is found) |
@@ -105,7 +113,17 @@ go run ./scripts/check --only-freestyle
105113
## Runner-level patterns
106114

107115
**Dependency graph:** Flat `DependsOn` slice per check. Blocked checks get `StatusBlocked` on dep failure and are
108-
counted as failed. Dependencies not in the selected run set are treated as satisfied.
116+
counted as failed. Dependencies not in the selected run set are treated as satisfied. Visualize it with
117+
`./scripts/check.sh --graph` (every check currently has ≤1 dependency, so it renders as a clean forest rooted at `oxfmt`
118+
/ `rustfmt` / `gofmt`).
119+
120+
**CPU-weighted admission:** Instead of a count semaphore, `tryStartPending` admits a check only when
121+
`sum(running CpuWeight) + weight ≤ NumCPU` (`runner.go`). A check first clears its dependencies (`canStart`), then the
122+
weight gate; if deps are ready but the budget is full it stays `Pending` and retries once a running check frees its
123+
weight. The `usedWeight == 0` clause lets an over-budget check run alone rather than deadlock. This keeps two CPU-heavy
124+
checks (e.g. `svelte-tests` w11 + `clippy`-cold w8) from piling up and oversubscribing the machine, while light/long
125+
checks (`eslint-typecheck` w2, the Docker checks) overlap freely. See the Key decision below and
126+
`docs/notes/check-cpu-contention.md`.
109127

110128
**Slow checks:** `IsSlow: true` marks checks excluded by default (currently: `rust-tests-linux`, `desktop-e2e-linux`,
111129
`desktop-e2e-playwright`). Named `--check` invocations implicitly include slow checks
@@ -169,6 +187,17 @@ Use `CommandExists()` to check if a tool is installed, and auto-install if possi
169187

170188
## Key decisions
171189

190+
**Decision**: CPU-weight-aware admission instead of a count semaphore. **Why**: The old gate allowed up to `NumCPU`
191+
concurrent checks, but a single check (vitest, a cold cargo compile) can itself saturate every core. So the short
192+
CPU-heavy checks all launched at once and oversubscribed the machine 2-3×, which starved timing-sensitive checks — the
193+
E2E modal/popover timeouts and the 8s-cap `file_viewer` test flaked under `--include-slow` for exactly this reason. Each
194+
check now carries a `CpuWeight` (avg busy cores, Docker-VM-aware) and the runner only starts a check when the running
195+
weights fit the core budget. Wall-clock stays bounded by the critical path (`eslint-typecheck`, which is long but ~1
196+
core, under `--include-slow`; cold `clippy` for the default suite) while peak oversubscription drops to ~1×. Weights
197+
were measured by an isolation sweep (`docs/notes/check-cpu-contention.md`); unmeasured/fast checks default to 1. The key
198+
insight from the sweep: the longest checks (`eslint-typecheck`, `e2e-linux`, `rust-tests-linux`) are NOT the heaviest —
199+
they idle ~1 core or run entirely in the Docker VM, so they make ideal backbone fillers for the CPU-heavy short checks.
200+
172201
**Decision**: Go instead of Bash for the check script. **Why**: Cross-platform support (especially Windows), type-safe,
173202
better error handling, and ability to build complex logic (parallel checks, dependency graph, colored output). Go is
174203
already in the toolchain via mise.

scripts/check/checks/CLAUDE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ CheckDefinition{
3131
IsFast: false, // true = included in --fast (curated pre-commit lane)
3232
CIOnly: false, // true = run only in --ci mode (or explicit --check)
3333
FreestyleIncompat: true, // can NOT run on freestyle.sh VMs (Rust, Docker)
34+
CpuWeight: 2, // avg cores busy; 0/unset = 1. Governs concurrent admission.
3435
DependsOn: []string{"desktop-svelte-prettier"},
3536
Run: RunDesktopESLint,
3637
}
@@ -52,6 +53,12 @@ CheckDefinition{
5253
that needs Docker. Negative-sense default (`false` = compatible) keeps the field absent in the common case.
5354
- **`DependsOn`** is a flat slice of IDs. Formatters before linters, linters before tests, type checkers before tests.
5455
Blocked checks (dep failed) get `StatusBlocked` automatically.
56+
- **`CpuWeight`** is the average number of CPU cores the check keeps busy while running (cold/working profile, rounded).
57+
The runner admits checks so the sum of concurrent weights stays within `NumCPU`, so two CPU-heavy checks don't
58+
oversubscribe the machine. `0` (unset) counts as `1` (light). Weights are Docker-VM-aware (`rust-tests-linux` /
59+
`e2e-linux` burn cores in the VM the host process never shows). Calibrate from the isolation sweep in
60+
`docs/notes/check-cpu-contention.md`; visualize with `./scripts/check.sh --graph`. Only the measured non-fast checks
61+
carry explicit weights today; fast/formatters default to 1.
5562

5663
## Adding a new check
5764

scripts/check/checks/common.go

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,11 +98,34 @@ type CheckDefinition struct {
9898
// of selected checks with this flag (see scripts/check/smb_orchestrator.go).
9999
// Without it, each such check tried to own the lifecycle itself and parallel
100100
// runs trampled each other via stop.sh.
101-
NeedsSmb SmbMode // "" = no SMB needed; "core" = integration tests; "e2e" = e2e tests
101+
NeedsSmb SmbMode // "" = no SMB needed; "core" = integration tests; "e2e" = e2e tests
102+
// CpuWeight is the average number of CPU cores this check keeps busy while it
103+
// runs (cold/working profile, rounded). The runner admits checks so the sum
104+
// of concurrent weights stays within the core budget, so two CPU-heavy checks
105+
// don't pile on top of each other and oversubscribe the machine. 0 means
106+
// unmeasured and is treated as 1 (light). Calibrated from the contention
107+
// sweep in `docs/notes/check-cpu-contention.md`; weights account for Docker-VM
108+
// CPU too (`rust-tests-linux` / `e2e-linux` burn cores in the VM that the host
109+
// process never shows).
110+
CpuWeight int
102111
DependsOn []string
103112
Run CheckFunc
104113
}
105114

115+
// EffectiveCpuWeight returns the scheduling weight clamped to [1, capacity], so
116+
// an unset weight counts as light (1) and an over-budget weight can still run
117+
// once nothing else holds the budget (it never deadlocks the admission gate).
118+
func (c *CheckDefinition) EffectiveCpuWeight(capacity int) int {
119+
w := c.CpuWeight
120+
if w < 1 {
121+
w = 1
122+
}
123+
if capacity > 0 && w > capacity {
124+
w = capacity
125+
}
126+
return w
127+
}
128+
106129
// SmbMode names the SMB consumer container set a check needs. Mirrors the
107130
// modes accepted by apps/desktop/test/smb-servers/start.sh.
108131
type SmbMode string

0 commit comments

Comments
 (0)