Skip to content

Commit 0c45a46

Browse files
committed
AI: Use genai crate to handle multi-provider quirks
Replace the hand-rolled OpenAI-compatible chat-completions client with `genai 0.6.0-beta.19`. Fixes two production failures observed in BYOK mode: (1) GPT-5 / o-series chat models 400ing on `temperature`, and (2) `gpt-*-pro` / `*-codex` models 404ing on `/v1/chat/completions` because they only respond on `/v1/responses`. `genai` auto-routes Responses-only models, normalizes the wire format across OpenAI / Anthropic / Gemini / xAI / Groq / DeepSeek / OpenRouter / Ollama, and exposes a unified `ReasoningEffort` knob. We layer one small heuristic (`is_openai_chat_reasoning_model`) on top to also strip `temperature` for `o1*`/`o3*`/`o4*`/`chatgpt-*` chat-completion models, where `genai` doesn't auto-route. - `client.rs` rewritten around `genai::Client`. `AiBackend` is now a struct bundling a long-lived `Client` with the model name, built via `AiBackend::local(port)` or `AiBackend::remote(key, url, model)`. - `manager.rs`: rename internal `openai_*` config fields to `cloud_*`, rename `get_openai_config` to `get_cloud_config`, add `BackendResolution` + `resolve_backend` so callers don't reimplement provider routing. - `suggestions.rs` and `commands/search.rs` use `resolve_backend()`; identical inline match logic deleted. - Provider value `openai-compatible` → `cloud`. Persisted settings `ai.openaiApiKey`/`BaseUrl`/`Model` and their legacy migration removed (cloud config has lived in `ai.cloudProviderConfigs` since the per-provider redesign; the legacy keys were dead weight). - Tauri command `configure_ai` parameter renames mirror the above. - UI label "Cloud / API" → "Cloud AI"; tooltip updated to mention Anthropic, Gemini, xAI, Groq, DeepSeek, OpenRouter natively. Fix Anthropic preset description (was wrongly claiming OpenAI-compat). - Map `genai::Error` to `AiError` by pattern-matching the `WebAdapterCall`/`WebModelCall` + `webc::Error` tree, not by string scraping the `Display` output. - Better error message when reasoning models eat `max_tokens` for thinking and emit no output text. - Fix latent bug discovered while writing tests: `genai`'s `Url::join("chat/completions")` strips the last path segment unless `base_url` ends with `/`. Without the fix, `https://api.openai.com/v1` silently becomes `https://api.openai.com/chat/completions` (404). Tests: - 8 wiremock integration tests covering request shape per adapter (chat completions vs Responses), parsing, error mapping. - 3 `#[ignore]`-gated real-OpenAI smoke tests (`gpt-4o-mini`, `gpt-5-mini`, `o3-mini`). All three pass against `api.openai.com`. - Existing `parse_suggestions` unit tests untouched. Updates `apps/desktop/src-tauri/src/ai/CLAUDE.md` with new architecture, decisions, and gotchas (trailing-slash quirk; reasoning models consuming `max_tokens` budget; per-adapter routing rules; defense-in-depth reasoning-model heuristic).
1 parent c9fad17 commit 0c45a46

18 files changed

Lines changed: 1032 additions & 376 deletions

Cargo.lock

Lines changed: 295 additions & 29 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

apps/desktop/src-tauri/Cargo.toml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,20 @@ rand = "0.10"
100100
# project rule on avoiding 0-day vulns. Default features include `status`,
101101
# `basic`, and `extras` which cover discover + ahead/behind + status walk.
102102
gix = "0.81"
103+
# Multi-provider LLM client. Normalizes OpenAI / OpenAI Responses / Anthropic / Gemini /
104+
# xAI / Groq / DeepSeek / Ollama / OpenRouter / etc. Auto-routes `gpt-5*` / `*-codex` /
105+
# `*-pro` to the Responses API; uses native protocols for Anthropic + Gemini. Replaces
106+
# hand-rolled chat-completions JSON in `ai/client.rs`, which broke on (1) reasoning models
107+
# that reject custom `temperature` (HTTP 400) and (2) Responses-only models that 404 on
108+
# `/v1/chat/completions`.
109+
#
110+
# Pinned at the EXACT 0.6.0-beta.19 (published 2026-05-06). 0.6 is in active beta with
111+
# weekly releases, but the API surface we use is stable across the 0.6 line, and this is
112+
# the first version that auto-routes plain `gpt-5*` to the Responses API (PR #167) — 0.5.3
113+
# only routed `*-pro` and `*-codex`. Stable 0.6.0 isn't out yet; bumping to a future
114+
# `0.6.0-beta.20` should be a no-op for us, but pin exact to avoid surprise breakage on
115+
# `cargo update`. Solo maintainer; MIT/Apache-2.0; ~25k LOC if we ever need to fork.
116+
genai = "=0.6.0-beta.19"
103117

104118
[target.'cfg(unix)'.dependencies]
105119
libc = "0.2"
@@ -183,6 +197,10 @@ env_logger = "0.11.10"
183197
# For test-only cryptographic key generation (compatible with ed25519-dalek's rand_core 0.6)
184198
rand_core = { version = "0.6", features = ["getrandom"] }
185199
tempfile = "3"
200+
# HTTP mock server for AI client integration tests. Verifies request shape (temperature
201+
# present on chat/completions, omitted on /v1/responses) and parsing without burning
202+
# real API quota. Pinned at 0.6.5 (latest stable, published 2025-08-24).
203+
wiremock = "0.6.5"
186204

187205
[[example]]
188206
name = "smb_compose"

apps/desktop/src-tauri/src/ai/CLAUDE.md

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,24 @@
11
# AI subsystem
22

3-
AI features powered by local LLM (llama-server) or OpenAI-compatible APIs. Currently used for folder name suggestions.
3+
AI features powered by local LLM (llama-server) or remote LLM providers. Currently used for folder name suggestions and natural-language search.
44

55
Three provider modes:
66
- **Off**: No AI features.
7-
- **OpenAI-compatible** (BYOK): Any OpenAI-compatible API. Works on any hardware.
7+
- **Cloud AI** (BYOK): OpenAI / Anthropic / Gemini / xAI / Groq / DeepSeek / OpenRouter / any OpenAI-compatible endpoint. Adapter is picked from the model name. Works on any hardware. Persisted as `ai.provider = "cloud"`; the Rust constructor is `AiBackend::remote(...)` because the same code path handles native Anthropic/Gemini protocols too.
88
- **Local LLM**: On-device llama-server. Requires Apple Silicon (aarch64).
99

1010
## Key files
1111

1212
| File | Purpose |
1313
|---|---|
1414
| `mod.rs` | Types (`AiStatus`, `AiState`, `DownloadProgress`, `ModelInfo`), model registry (`AVAILABLE_MODELS`, `DEFAULT_MODEL_ID`), `is_local_ai_supported()` gate |
15-
| `manager.rs` | Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. Stores provider + OpenAI config in `ManagerState`. |
15+
| `manager.rs` | Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. Stores provider + cloud-AI config (`cloud_api_key`/`cloud_base_url`/`cloud_model`). Exposes `resolve_backend() -> BackendResolution` so callers don't reinvent provider routing. |
1616
| `download.rs` | HTTP streaming download with Range-based resume. Emits `ai-download-progress` events (200ms throttle). Cooperative cancellation via function parameter (`Fn() -> bool`). |
1717
| `extract.rs` | Copies bundled `llama-server` binary + dylibs from `resources/ai/` to the AI data dir. Sets Unix permissions, handles symlinks. |
1818
| `process.rs` | Spawns child process with `DYLD_LIBRARY_PATH` set. Instant SIGKILL to stop (llama-server is stateless; macOS reclaims all GPU/mmap resources). `kill_process` for fire-and-forget (quit, orphans), `kill_and_reap_in_background` for normal operation (reaps zombie in bg thread). `kill_stale_llama_servers` for belt-and-suspenders orphan cleanup by process name. Port discovery via `bind(:0)`. |
19-
| `client.rs` | reqwest client with `AiBackend` enum: `Local { port }` or `OpenAi { api_key, base_url, model }`. Routes requests accordingly. |
19+
| `client.rs` | `genai`-backed chat client. `AiBackend` is a struct bundling a long-lived `genai::Client` with a model name; built via `AiBackend::local(port)` or `AiBackend::remote(api_key, base_url, model)`. The model name picks the adapter (`claude-*` → Anthropic native, `gemini-*` → Gemini native, `gpt-5*`/`*-pro`/`*-codex` → OpenAI Responses API, etc.). Auto-omits `temperature`/`top_p` for OpenAI Responses adapter and for chat-completions reasoning models (`o1*`, `o3*`, `o4*`, `chatgpt-*`, `gpt-5*` defense-in-depth) and substitutes `ReasoningEffort::Low`. Local backend forces the OpenAI adapter via a `ServiceTargetResolver` pinning endpoint to `http://127.0.0.1:<port>/v1/`. |
20+
| `client_integration_test.rs` | `wiremock`-based tests covering request shape per adapter (chat completions vs Responses API), parsing, error mapping. Always run in CI. |
21+
| `client_real_openai_test.rs` | `#[ignore]`-gated smoke tests against `api.openai.com`. Run with `OPENAI_API_KEY=$(security find-generic-password -a "$USER" -s "OPENAI_API_KEY" -w) cargo nextest run --lib --run-ignored only ai::client_real_openai_test`. Costs ~$0.001 per full run. Use after refactors that touch `client.rs`. |
2022
| `suggestions.rs` | Builds few-shot prompt from listing cache, routes to configured backend, sanitizes response. |
2123

2224
### Tauri commands
@@ -42,12 +44,15 @@ Frontend loads
4244
-> emit 'ai-server-ready' when healthy
4345
```
4446

45-
## Provider routing in suggestions
47+
## Provider routing
4648

47-
`get_folder_suggestions` reads `provider` from `ManagerState`:
48-
- `off` -> returns empty
49-
- `local` -> uses local llama-server (if running)
50-
- `openai-compatible` -> builds `AiBackend::OpenAi` from stored config, calls `chat_completion`
49+
Centralized in `manager::resolve_backend() -> BackendResolution`:
50+
- `Off`: provider is `"off"`.
51+
- `NotConfigured(reason)`: provider is set but missing config (local server not running, cloud key blank).
52+
- `Ready(AiBackend)`: backend ready to call `chat_completion` on.
53+
- `UnknownProvider(name)`: provider value isn't recognized.
54+
55+
Callers decide what to do per case. `suggestions.rs` returns empty on any non-Ready (folder suggestions are nice-to-have). `commands/search.rs::translate_search_query` returns the human-readable reason as an error so the UI can toast it.
5156

5257
## Download/install event sequence
5358

@@ -108,8 +113,17 @@ The frontend (`AiSection.svelte`) tracks `installStep` state and displays "Step
108113
direction is to make OpenAI-compatible the primary recommended path, with local LLM as the secondary option for
109114
privacy-focused users. The architecture doesn't fight this switch — it's just a default value change.
110115

116+
**Decision**: Use `genai` crate as the chat client instead of hand-rolled `reqwest` JSON.
117+
**Why**: We hit two production bugs that were per-provider quirks: (1) GPT-5/o-series chat models reject any non-default `temperature` (HTTP 400), and (2) `gpt-*-pro` / `*-codex` models only respond on `/v1/responses`, not `/v1/chat/completions` (HTTP 404). Each new model adds another quirk. `genai` normalizes ~20 providers, auto-routes Responses-API models, and gives us Anthropic / Gemini / xAI / OpenRouter for free with the same code path. Tradeoff: pinned at `0.5.3` (stable, ~3 months old) with a solo maintainer; mitigated by it being MIT/Apache-2.0 + small enough to fork if needed.
118+
111119
## Gotchas
112120

121+
**Gotcha**: `genai` requires `base_url` to end with `/`. Without the trailing slash, `Url::join("chat/completions")` strips the last segment and you'd hit `https://api.openai.com/chat/completions` (404) instead of `/v1/chat/completions`. `client.rs::build_client` normalizes by appending `/` if missing.
122+
123+
**Gotcha**: `genai 0.6` auto-routes `gpt-5*`, `*-codex`, `*-pro` to the Responses API, but `o1*`/`o3*`/`o4*`/`chatgpt-*` stay on Chat Completions even though they also reject custom `temperature`. We layer `is_openai_chat_reasoning_model()` on top to strip `temperature`/`top_p` and substitute `ReasoningEffort::Low` for those. The heuristic also matches `gpt-5*` as defense-in-depth in case `genai`'s routing rule changes.
124+
125+
**Gotcha**: For reasoning models, `max_tokens` (`max_output_tokens` on Responses API) covers reasoning + visible answer combined. Real-world finding: at `ReasoningEffort::Low`, `gpt-5-mini` consumed all 40 tokens thinking and emitted no `output_text``first_text()` returned `None`. `suggestions.rs` (`max_tokens=150`) and `commands/search.rs` (`max_tokens=200`) may occasionally produce empty results when the user picks a reasoning model. Bump to `max_tokens >= 300` if empty-result rate becomes a problem; the empty-result graceful degradation already covers it functionally.
126+
113127
**Gotcha**: `tauri::async_runtime::spawn` is used in `configure_ai` and `start_ai_server` instead of `tokio::spawn`.
114128
**Why**: These may run during Tauri setup before the tokio runtime is fully available. `tauri::async_runtime::spawn` uses Tauri's own runtime which is always ready at that point.
115129

@@ -127,5 +141,6 @@ privacy-focused users. The architecture doesn't fight this switch — it's just
127141

128142
## Dependencies
129143

130-
External: `reqwest`, `tokio`, `libc`, `futures_util`
144+
External: `genai` (chat normalization), `reqwest` (download streaming + `health_check`), `tokio`, `libc`, `futures_util`
145+
Dev: `wiremock` (HTTP mock for `client_integration_test.rs`)
131146
Internal: `crate::ignore_poison::IgnorePoison`, `crate::file_system::get_file_at`

0 commit comments

Comments
 (0)