Skip to content

Commit 1c8363b

Browse files
committed
AI: stop logging an ERROR (and auto-reporting) when local AI is toggled off mid-startup
Switching the AI provider off — or rapidly off/on — while the local llama-server was still booting logged `AI server: process died during startup` at ERROR and tripped the auto error reporter, because the detached health-check waiter couldn't tell our deliberate kill from a crash. Toggling local AI on and off is a normal thing a user does; it shouldn't surface an error or send a report. (Diagnosed from auto-report `ERR-BKHYQ`: provider flipped local->cloud ~1.5s into boot, the stop and the "died during startup" ERROR landed 90ms apart.) - Each spawn mints a `CancellationToken` in `ManagerState.start_cancel`. Every intentional stop fires it before killing the process: switching away from local in `configure_ai`, `stop_ai_server`, `shutdown`, and a superseding spawn in `spawn_and_track_server`. - `wait_for_server_health` now `select!`s on the token (`biased`, so cancel wins a same-tick tie with the death check) and returns a three-way `StartupOutcome`: `Ready` (emit `ai-server-ready`), `Cancelled` (debug log, no event), or `Failed` (ERROR — still auto-reports). A genuine crash or 60s timeout still fails loudly; an intentional stop is silent. - `handle_startup_outcome` centralizes the per-outcome logging/eventing for both the `configure_ai` and `start_ai_server` paths, and clears `server_starting` only when the finishing task still owns the slot (`startup_task_owns_slot`), so a superseded task can't reset a newer startup's flag. - TDD: red-verified that a cancelled startup of a dead PID was misreported as `Failed`, then green. Tests cover the cancelled-vs-failed split and the slot-ownership rule. Note: the captured "last log lines" on a real failure are only llama-server's benign Metal init lines, not the fatal reason — capturing the exit code + a longer stderr tail is a separate follow-up.
1 parent 9eee539 commit 1c8363b

2 files changed

Lines changed: 194 additions & 63 deletions

File tree

apps/desktop/src-tauri/src/ai/DETAILS.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,24 @@ Frontend loads
5959
-> emit 'ai-server-ready' when healthy
6060
```
6161

62+
### Startup cancellation (quiet stop vs real failure)
63+
64+
`wait_for_server_health` runs detached while the user keeps interacting, so it must tell an
65+
*intentional* stop apart from a *crash*. Each spawn mints a `CancellationToken` stored in
66+
`ManagerState.start_cancel`; every intentional stop fires it before killing the process:
67+
switching the provider away from local (`configure_ai`), `stop_ai_server`, `shutdown`, and a
68+
superseding spawn (`spawn_and_track_server` cancels the prior token). The waiter's poll loop
69+
`select!`s on the token with `biased` (cancel wins a same-tick tie with the death check) and
70+
returns a three-way `StartupOutcome`: `Ready` (emit `ai-server-ready`), `Cancelled` (log at
71+
debug, emit nothing), or `Failed` (log ERROR — which auto-reports). So toggling local AI on
72+
and off, even rapidly, is silent; only a genuine startup failure surfaces an error.
73+
74+
Guardrail: don't collapse `StartupOutcome` back to a `Result` or drop the token wiring — a
75+
deliberate stop mid-startup would then be logged as `process died during startup` and
76+
auto-send an error report (the exact false alarm this replaced). `handle_startup_outcome`
77+
clears `server_starting` only via `startup_task_owns_slot`, so a superseded task can't reset a
78+
newer startup's flag.
79+
6280
## Provider routing
6381

6482
Centralized in `manager::resolve_backend() -> BackendResolution`:

0 commit comments

Comments
 (0)