vdavid
diff --git a/‎apps/desktop/src-tauri/src/ai/CLAUDE.md‎
Lines changed: 4 additions & 3 deletions b/‎apps/desktop/src-tauri/src/ai/CLAUDE.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎apps/desktop/src-tauri/src/ai/client.rs‎
Lines changed: 40 additions & 10 deletions b/‎apps/desktop/src-tauri/src/ai/client.rs‎
Lines changed: 40 additions & 10 deletions
diff --git a/‎apps/desktop/src-tauri/src/ai/mod.rs‎
Lines changed: 3 additions & 0 deletions b/‎apps/desktop/src-tauri/src/ai/mod.rs‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎apps/desktop/src-tauri/src/ai/translate_error.rs‎
Lines changed: 120 additions & 0 deletions b/‎apps/desktop/src-tauri/src/ai/translate_error.rs‎
Lines changed: 120 additions & 0 deletions
@@ -19,7 +19,8 @@ Three provider modes:
 | `download.rs` | HTTP streaming download with Range-based resume. Emits `ai-download-progress` events (200ms throttle). Cooperative cancellation via function parameter (`Fn() -> bool`). |
 | `extract.rs` | Copies bundled `llama-server` binary + dylibs from `resources/ai/` to the AI data dir. Sets Unix permissions, handles symlinks. |
 | `process.rs` | Spawns child process with `DYLD_LIBRARY_PATH` set. Instant SIGKILL to stop (llama-server is stateless; macOS reclaims all GPU/mmap resources). `kill_process` for fire-and-forget (quit, orphans), `kill_and_reap_in_background` for normal operation (reaps zombie in bg thread). `kill_stale_llama_servers` for belt-and-suspenders orphan cleanup by process name. Port discovery via `bind(:0)`. |
-| `client.rs` | `genai`-backed chat client. `AiBackend` is a struct bundling a long-lived `genai::Client` with a model name; built via `AiBackend::local(port)` or `AiBackend::remote(api_key, base_url, model)`. The model name picks the adapter (`claude-*` → Anthropic native, `gemini-*` → Gemini native, `gpt-5*`/`*-pro`/`*-codex` → OpenAI Responses API, etc.). Auto-omits `temperature`/`top_p` for OpenAI Responses adapter and for chat-completions reasoning models (`o1*`, `o3*`, `o4*`, `chatgpt-*`, `gpt-5*` defense-in-depth) and substitutes `ReasoningEffort::Low`. Local backend forces the OpenAI adapter via a `ServiceTargetResolver` pinning endpoint to `http://127.0.0.1:<port>/v1/`. Exposes both `chat_completion` (full response) and `chat_completion_stream` (returns a `BoxStream<Result<String, AiError>>` of content chunks; reasoning/thought-signature/tool-call chunks filtered out). |
+| `client.rs` | `genai`-backed chat client. `AiBackend` is a struct bundling a long-lived `genai::Client` with a model name; built via `AiBackend::local(port)` or `AiBackend::remote(api_key, base_url, model)`. The model name picks the adapter (`claude-*` → Anthropic native, `gemini-*` → Gemini native, `gpt-5*`/`*-pro`/`*-codex` → OpenAI Responses API, etc.). Auto-omits `temperature`/`top_p` for OpenAI Responses adapter and for chat-completions reasoning models (`o1*`, `o3*`, `o4*`, `chatgpt-*`, `gpt-5*` defense-in-depth) and substitutes `ReasoningEffort::Low`. Local backend forces the OpenAI adapter via a `ServiceTargetResolver` pinning endpoint to `http://127.0.0.1:<port>/v1/`. Exposes both `chat_completion` (full response) and `chat_completion_stream` (returns a `BoxStream<Result<String, AiError>>` of content chunks; reasoning/thought-signature/tool-call chunks filtered out). `AiError` is typed by HTTP status via the pure `ai_error_for_status` (401/403 → `AuthFailed`, 429 → `RateLimited`, else `ServerError`); a `None` `first_text()` → `EmptyResponse`. |
+| `translate_error.rs` | `AiTranslateError { kind, message }` + `AiTranslateErrorKind` enum, the typed error the two translate IPC commands return so the frontend branches on `kind` (not the message string). `From<AiError>` maps transport variants; the commands map `BackendResolution` non-ready cases. Mirror enum: `lib/ai/translate-error-toast.ts`. |
 | `client_integration_test.rs` | `wiremock`-based tests covering request shape per adapter (chat completions vs Responses API), parsing, error mapping. Always run in CI. |
 | `client_streaming_test.rs` | `axum`-based SSE mock server tests for `chat_completion_stream`: chunks arrive in order, empty streams end cleanly, drop-mid-stream closes the connection, HTTP 5xx maps to `ServerError`. Always run in CI. (Wiremock can't chunk-deliver SSE bodies. See Gotchas.) |
 | `client_real_openai_test.rs` | `#[ignore]`-gated smoke tests against `api.openai.com`, including streaming variants for `gpt-4o-mini`, `gpt-5-mini`, `o3-mini`. Run with `OPENAI_API_KEY=$(security find-generic-password -a "$USER" -s "OPENAI_API_KEY" -w) cargo nextest run --lib --run-ignored only ai::client_real_openai_test`. Costs ~$0.001 per full run. |
@@ -60,7 +61,7 @@ Centralized in `manager::resolve_backend() -> BackendResolution`:
 - `Ready(AiBackend)`: backend ready to call `chat_completion` on.
 - `UnknownProvider(name)`: provider value isn't recognized.
 
-Callers decide what to do per case. `suggestions.rs` returns empty on any non-Ready (folder suggestions are nice-to-have). `commands/search.rs::translate_search_query` returns the human-readable reason as an error so the UI can toast it.
+Callers decide what to do per case. `suggestions.rs` returns empty on any non-Ready (folder suggestions are nice-to-have). The two translate commands (`commands/search.rs::translate_search_query`, `commands/selection.rs::translate_selection_query`) return a typed `AiTranslateError { kind, message }` (in `translate_error.rs`) so the frontend can branch on `kind` and show a SPECIFIC toast (key rejected vs. out of quota vs. timed out vs. empty answer) without string-matching the message. The `kind` set maps both the `BackendResolution` non-ready cases (`off` / `notConfigured` / `unknownProvider`) and the `AiError` transport variants (`authFailed` / `rateLimited` / `timeout` / `unavailable` / `emptyResponse` / `serverError` / `parseError`). Frontend counterpart: `lib/ai/translate-error-toast.ts`; keep the two enums in lockstep.
 
 ## Download/install event sequence
 
@@ -149,7 +150,7 @@ privacy-focused users. The architecture doesn't fight this switch: it's just a d
 
 **Gotcha**: `genai 0.6` auto-routes `gpt-5*`, `*-codex`, `*-pro` to the Responses API, but `o1*`/`o3*`/`o4*`/`chatgpt-*` stay on Chat Completions even though they also reject custom `temperature`. We layer `is_openai_chat_reasoning_model()` on top to strip `temperature`/`top_p` and substitute `ReasoningEffort::Low` for those. The heuristic also matches `gpt-5*` as defense-in-depth in case `genai`'s routing rule changes.
 
-**Gotcha**: For reasoning models, `max_tokens` (`max_output_tokens` on Responses API) covers reasoning + visible answer combined. Real-world finding: at `ReasoningEffort::Low`, `gpt-5-mini` consumed all 40 tokens thinking and emitted no `output_text`, so `first_text()` returned `None`. `suggestions.rs` (`max_tokens=150`) and `commands/search.rs` (`max_tokens=200`) may occasionally produce empty results when the user picks a reasoning model. Bump to `max_tokens >= 300` if empty-result rate becomes a problem; the empty-result graceful degradation already covers it functionally.
+**Gotcha**: For reasoning models, `max_tokens` (`max_output_tokens` on Responses API) covers reasoning + visible answer combined. Real-world finding: at `ReasoningEffort::Low`, `gpt-5-mini` consumed all 40 tokens thinking and emitted no `output_text`, so `first_text()` returned `None`. Both translate commands now request `max_tokens=300` (search bumped from 200; selection already 300) to give reasoning room before the visible answer. When `first_text()` is still `None`, `chat_completion` returns the typed `AiError::EmptyResponse` (not a generic parse error), which surfaces as a specific "the AI came back empty, try a faster model" toast. `suggestions.rs` (`max_tokens=150`) stays graceful-empty since folder suggestions are nice-to-have. Picking a non-reasoning model (the default `gpt-4.1-mini`) sidesteps this entirely.
 
 **Gotcha**: `tauri::async_runtime::spawn` is used in `configure_ai` and `start_ai_server` instead of `tokio::spawn`.
 **Why**: These may run during Tauri setup before the tokio runtime is fully available. `tauri::async_runtime::spawn` uses Tauri's own runtime which is always ready at that point.
 
@@ -67,7 +67,14 @@ pub enum AiError {
     Unavailable,
     /// Request timed out (server too slow, or local server unhealthy).
     Timeout,
-    /// Server returned an HTTP error or otherwise misbehaved.
+    /// The provider rejected the API key (HTTP 401 / 403).
+    AuthFailed(String),
+    /// The provider is rate-limiting requests or the account is out of quota (HTTP 429).
+    RateLimited(String),
+    /// The call succeeded but the model produced no visible text. Common on reasoning models
+    /// when `max_tokens` is fully consumed by reasoning before any answer is emitted.
+    EmptyResponse,
+    /// Server returned some other HTTP error or otherwise misbehaved.
     ServerError(String),
     /// Couldn't parse the response body.
     ParseError(String),
@@ -78,6 +85,9 @@ impl std::fmt::Display for AiError {
         match self {
             Self::Unavailable => write!(f, "AI server unavailable"),
             Self::Timeout => write!(f, "AI request timed out"),
+            Self::AuthFailed(msg) => write!(f, "AI provider rejected the API key: {msg}"),
+            Self::RateLimited(msg) => write!(f, "AI provider is rate-limiting or out of quota: {msg}"),
+            Self::EmptyResponse => write!(f, "AI returned no text"),
             Self::ServerError(msg) => write!(f, "AI server error: {msg}"),
             Self::ParseError(msg) => write!(f, "AI response parse error: {msg}"),
         }
@@ -122,14 +132,11 @@ pub async fn chat_completion(
 
     let text = res
         .first_text()
-        .ok_or_else(|| {
-            // Common on reasoning models (`gpt-5*`, `o3*`, `*-pro`) when `max_tokens`
-            // gets fully consumed by reasoning before any `output_text` is emitted.
-            // The HTTP call succeeded; there's just no visible answer to return.
-            AiError::ParseError(String::from(
-                "AI returned no text. Likely max_tokens fully consumed by reasoning. Increase max_tokens.",
-            ))
-        })?
+        // Common on reasoning models (`gpt-5*`, `o3*`, `*-pro`) when `max_tokens` gets
+        // fully consumed by reasoning before any `output_text` is emitted. The HTTP call
+        // succeeded; there's just no visible answer to return. Typed so callers can tell
+        // the user to pick a simpler model or raise the token budget.
+        .ok_or(AiError::EmptyResponse)?
         .to_owned();
 
     log::trace!("AI chat_completion: extracted content: {text}");
@@ -254,6 +261,18 @@ fn make_resolver(endpoint: String, auth: AuthData, force_adapter: ForceAdapter)
     })
 }
 
+/// Classifies a provider HTTP error status into the right [`AiError`] so the frontend can
+/// show a specific toast (key rejected vs. out of quota vs. generic server error). Branches
+/// on the numeric status, never the message body. 429 covers both rate-limiting and
+/// OpenAI's `insufficient_quota`; 401/403 is a rejected key.
+fn ai_error_for_status(status: u16, detail: String) -> AiError {
+    match status {
+        401 | 403 => AiError::AuthFailed(detail),
+        429 => AiError::RateLimited(detail),
+        _ => AiError::ServerError(detail),
+    }
+}
+
 /// Maps `genai`'s rich error tree to our flat [`AiError`]. Pattern-matches on the
 /// known transport variants instead of grepping the `Display` output.
 fn map_genai_error(e: genai::Error) -> AiError {
@@ -271,7 +290,7 @@ fn map_genai_error(e: genai::Error) -> AiError {
             W::Reqwest(req) if req.is_connect() => return AiError::Unavailable,
             W::Reqwest(req) => return AiError::ServerError(format!("network error: {req}")),
             W::ResponseFailedStatus { status, body, .. } => {
-                return AiError::ServerError(format!("HTTP {status}: {body}"));
+                return ai_error_for_status(status.as_u16(), format!("HTTP {status}: {body}"));
             }
             W::ResponseFailedNotJson { content_type, body } => {
                 return AiError::ParseError(format!(
@@ -329,6 +348,7 @@ mod tests {
     fn test_ai_error_display() {
         assert_eq!(AiError::Unavailable.to_string(), "AI server unavailable");
         assert_eq!(AiError::Timeout.to_string(), "AI request timed out");
+        assert_eq!(AiError::EmptyResponse.to_string(), "AI returned no text");
         assert_eq!(
             AiError::ServerError(String::from("bad")).to_string(),
             "AI server error: bad"
@@ -339,6 +359,16 @@ mod tests {
         );
     }
 
+    #[test]
+    fn ai_error_for_status_classifies_by_code() {
+        assert!(matches!(ai_error_for_status(401, "x".into()), AiError::AuthFailed(_)));
+        assert!(matches!(ai_error_for_status(403, "x".into()), AiError::AuthFailed(_)));
+        // 429 is both rate-limiting and OpenAI's `insufficient_quota`.
+        assert!(matches!(ai_error_for_status(429, "x".into()), AiError::RateLimited(_)));
+        assert!(matches!(ai_error_for_status(500, "x".into()), AiError::ServerError(_)));
+        assert!(matches!(ai_error_for_status(404, "x".into()), AiError::ServerError(_)));
+    }
+
     #[test]
     fn test_is_openai_chat_reasoning_model() {
         assert!(is_openai_chat_reasoning_model("o1"));
 
@@ -35,6 +35,9 @@ mod process;
 pub mod suggestions;
 #[cfg(test)]
 mod suggestions_streaming_test;
+pub mod translate_error;
+
+pub use translate_error::{AiTranslateError, AiTranslateErrorKind};
 
 use serde::{Deserialize, Serialize};
 
 
@@ -0,0 +1,120 @@
+//! Typed error for the AI natural-language translation commands.
+//!
+//! Search and Selection both translate a prompt into a structured query via
+//! [`crate::ai::client::chat_completion`]. When that fails (provider off, key rejected,
+//! quota / rate limit, timeout, empty answer), the dialogs need to show a SPECIFIC toast,
+//! not a generic "something went wrong". A bare `String` error would force the frontend to
+//! string-match the message (banned by the `no-string-matching` rule), so we cross the IPC
+//! boundary with a typed `kind` plus a human-readable `message`. The frontend branches on
+//! `kind`; `message` is detail for logs, never for control flow.
+
+use serde::Serialize;
+
+use super::client::AiError;
+
+/// Coarse, frontend-branchable classification of an AI translation failure.
+///
+/// Keep in lockstep with the `AiErrorKind` switch in
+/// `apps/desktop/src/lib/ai/translate-error-toast.ts`.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, specta::Type)]
+#[serde(rename_all = "camelCase")]
+pub enum AiTranslateErrorKind {
+    /// AI is turned off (`provider = "off"`).
+    Off,
+    /// Provider is selected but not usable yet (no key, local server down, wrong provider).
+    NotConfigured,
+    /// The provider rejected the API key (HTTP 401 / 403).
+    AuthFailed,
+    /// The provider is rate-limiting requests or the account is out of quota (HTTP 429).
+    RateLimited,
+    /// The request timed out.
+    Timeout,
+    /// Couldn't reach the provider (DNS / connection refused).
+    Unavailable,
+    /// The model returned no usable text (often a reasoning model burning the token budget).
+    EmptyResponse,
+    /// The provider returned some other HTTP error or otherwise misbehaved.
+    ServerError,
+    /// Couldn't parse the provider's response.
+    ParseError,
+    /// The configured provider value isn't recognized.
+    UnknownProvider,
+}
+
+/// Typed error returned by `translate_search_query` / `translate_selection_query`.
+///
+/// `message` is a human-readable detail string for logs and the toast's secondary line; the
+/// frontend chooses the headline + tone from `kind`, never by parsing `message`.
+#[derive(Debug, Clone, Serialize, specta::Type)]
+#[serde(rename_all = "camelCase")]
+pub struct AiTranslateError {
+    pub kind: AiTranslateErrorKind,
+    pub message: String,
+}
+
+impl AiTranslateError {
+    pub fn new(kind: AiTranslateErrorKind, message: impl Into<String>) -> Self {
+        Self {
+            kind,
+            message: message.into(),
+        }
+    }
+}
+
+impl std::fmt::Display for AiTranslateError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{:?}: {}", self.kind, self.message)
+    }
+}
+
+impl std::error::Error for AiTranslateError {}
+
+impl From<AiError> for AiTranslateError {
+    fn from(e: AiError) -> Self {
+        use AiTranslateErrorKind as K;
+        let kind = match e {
+            AiError::Unavailable => K::Unavailable,
+            AiError::Timeout => K::Timeout,
+            AiError::AuthFailed(_) => K::AuthFailed,
+            AiError::RateLimited(_) => K::RateLimited,
+            AiError::EmptyResponse => K::EmptyResponse,
+            AiError::ServerError(_) => K::ServerError,
+            AiError::ParseError(_) => K::ParseError,
+        };
+        Self::new(kind, e.to_string())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn maps_each_ai_error_to_its_kind() {
+        use AiTranslateErrorKind as K;
+        let cases = [
+            (AiError::Unavailable, K::Unavailable),
+            (AiError::Timeout, K::Timeout),
+            (AiError::AuthFailed("x".into()), K::AuthFailed),
+            (AiError::RateLimited("x".into()), K::RateLimited),
+            (AiError::EmptyResponse, K::EmptyResponse),
+            (AiError::ServerError("x".into()), K::ServerError),
+            (AiError::ParseError("x".into()), K::ParseError),
+        ];
+        for (err, expected) in cases {
+            assert_eq!(AiTranslateError::from(err).kind, expected);
+        }
+    }
+
+    #[test]
+    fn carries_the_detail_message() {
+        // The detail flows through verbatim (the source error's Display), so logs / the
+        // toast's secondary line keep the provider's wording. We compare against Display
+        // rather than substring-matching the message (the no-string-matching rule).
+        let src = AiError::RateLimited("HTTP 429: out of quota".into());
+        let expected = src.to_string();
+        let err = AiTranslateError::from(src);
+        assert_eq!(err.kind, AiTranslateErrorKind::RateLimited);
+        assert_eq!(err.message, expected);
+    }
+}