cnshsliu
diff --git a/‎.gitignore‎
Lines changed: 12 additions & 0 deletions b/‎.gitignore‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 19 additions & 1 deletion b/‎AGENTS.md‎
Lines changed: 19 additions & 1 deletion
diff --git a/‎CHANGELOG.md‎
Lines changed: 101 additions & 2 deletions b/‎CHANGELOG.md‎
Lines changed: 101 additions & 2 deletions
diff --git a/‎DEVELOPMENT.md‎
Lines changed: 20 additions & 2 deletions b/‎DEVELOPMENT.md‎
Lines changed: 20 additions & 2 deletions
@@ -20,6 +20,12 @@ vendors/*/xcuserdata/
 *.xcworkspace/
 DerivedData/
 xcuserdata/
+*.d
+*.dia
+*.o
+*.swiftdeps
+*.emit-module.d
+*.emit-module.dia
 
 # ==================== Python ====================
 __pycache__/
@@ -46,11 +52,17 @@ MEMORY.md
 
 # ==================== 外部仓库（本地参考，不入库） ====================
 mlx-swift-lm/
+vendors/
+
+# ==================== Grok 工具缓存 ====================
+.grok/
 
 # ==================== 其他常见忽略 ====================
 .env
 .env.local
 .vscode/
 .idea/
 *.log
+*.tar.gz
 .aider*
+opencode.json
@@ -42,13 +42,31 @@ swift test --filter <TestName>             # single test or pattern
 
 Tests live under `Tests/NovaMLX*Tests/`.
 
+### E2E Model Tests
+
+```bash
+# Test all downloaded LLM models (load → 4 API tests → unload)
+Scripts/test-all-models.sh
+```
+
 ## Logs & Config
 
 - Runtime log: `~/.nova/novamlx.log`
 - Config: `~/.nova/config.json`
 
+### Runtime Log Level
+
+```bash
+# View current level
+curl http://127.0.0.1:6591/admin/api/log-level -H "Authorization: Bearer $KEY"
+
+# Enable debug logging
+curl -X PUT http://127.0.0.1:6591/admin/api/log-level \
+  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" -d '{"level": 0}'
+```
+Levels: 0=debug, 1=info, 2=warning, 3=error
+
 ## Project conventions
 
 - Source: `Sources/NovaMLX{Core,Engine,Inference,API,Utils,MenuBar,ModelManager,...}/`
 - Vendored deps: `vendors/mlx-swift/`, `vendors/mlx-swift-lm/` (treat as read-only — modifications go via `Scripts/patch-*.py`)
-- Active diagnostic todos: `todo.markdown`
 
@@ -2,10 +2,109 @@
 
 All notable changes to NovaMLX will be documented in this file.
 
+## [1.0.8] - 2026-05-08
+
+### Added
+- Pre-emptive memory feasibility check in model list endpoint
+- Memory feasibility data per model in admin API
+
+### Fixed
+- Prevent double-finish race in SSE keep-alive continuation
+- Scheduler concurrency regression tests
+
+## [1.0.7] - 2026-05-07
+
+### Fixed
+- Eliminate 4 concurrency races in FusedBatchScheduler
+- Add FinishGuard for atomic continuation lifecycle
+
+## [1.0.6] - 2026-05-06
+
+### Fixed
+- Preserve `tool_calls` and `tool_call_id` in OpenAI incoming message mapping
+- Preserve `tool_use`/`tool_result` blocks in Anthropic message mapping
+
+## [1.0.5] - 2026-05-05
+
+### Added
+- Prefix cache: async write + async eviction in SSDCacheStore
+- Prefix cache: safetensors header-only reader (replaces full-file scan)
+- Prefix cache: skip fetch/store for VLM paths
+- Prefix cache: pre-flight RotatingKVCache probe before SSD fetch
+- Prefix cache: repeated-prefix TTFT benchmark
+
+### Fixed
+- E2E test: skip VLMs in core API suite, accept reasoning-only Harmony output
+
+## [1.0.4] - 2026-05-04
+
+### Added
+- Audio transcription (`/v1/audio/transcriptions`) — Qwen3-ASR (Swift/MLX)
+- Image generation (`/v1/images/generations`) — SDXL-Turbo
+- Modelfile system — user-authored model recipes with system prompt and sampling overrides
+- Per-request `keep_alive` — override model TTL per request
+- Harmony streaming protocol — GPT-OSS channel-aware format
+- ThinkingBudgetProcessor — per-request thinking token budget control
+- Strict-FSM JSON logit processor — structured output with JSON schema
+- Chat template library — three-level template resolution (user > registry > downloaded)
+- `isImplicitThinkingModel` rewrite — auto-detect implicit thinking models at load time
+- TokenMaskBuilder cache — pre-decoded vocabulary for fast masking
+- VLM LogitProcessor chain — thinking detection for vision-language models
+- DeepSeek-V4 lite regression test suite
+
+### Fixed
+- ThinkingParser regression tests
+- Build script sync for worker binary
+
+## [1.0.3] - 2026-05-02
+
+### Added
+- Cloud auth gate with subscription validation
+- WebUI dashboard (SPA with status, models, chat pages)
+- CLI login/logout/account commands
+- GUI settings auth integration
+
+### Changed
+- Tagline updated from "fastest" to "blazing fast"
+
+## [1.0.2] - 2026-05-01
+
+### Added
+- Homebrew tap distribution (`brew install novamlx`)
+- Full OpenAI and Anthropic tools/function calling support
+- Dynamic suggested searches from GitHub config
+- Generic control token filtering for streaming output
+- Agent-aware context scaling (ClientDetector)
+
+### Fixed
+- Buffer partial control tokens in streaming to prevent leaked fragments
+- CI: patch mlx-swift-lm StrictConcurrency error
+- CI: recurse submodules when cloning mlx-swift
+
+## [1.0.1] - 2026-04-30
+
+### Added
+- Worker subprocess isolation for crash recovery
+- TurnStopProcessor for Qwen3.6 turn separator handling
+- ProcessMemoryEnforcer with soft/hard limits
+- OCROptimizer for OCR model parameter tuning
+- N-gram speculative decoding in FusedBatchScheduler
+- Draft-model speculative decoding (SpeculativeTokenIterator)
+- Full i18n system — 9 languages
+- Web chat with input history and parameter controls
+- Settings page with collapsible config.json editor
+- Cloud model support — remote inference proxy with streaming
+
+### Changed
+- UI overhaul across all views
+
+### Fixed
+- Streaming deadlock fix
+
 ## [1.0.0] - 2025-04-09
 
 ### Added
-- Pure Swift SPM project with 9 modular targets
+- Pure Swift SPM project with modular targets
 - Paged in-memory KV Cache with LRU eviction
 - SSD-backed KV Cache for persistent caching
 - Continuous batching for concurrent request processing
@@ -19,7 +118,7 @@ All notable changes to NovaMLX will be documented in this file.
 - Native macOS menu bar app with SwiftUI
 - System monitoring (CPU, memory, GPU)
 - Health check and stats endpoints
-- Comprehensive test suite (Core, KVCache, Engine, Inference, ModelManager, API)
+- Comprehensive test suite
 - MIT License
 
 ### Performance
 
@@ -110,7 +110,7 @@ swift test --filter NovaMLXCoreTests
 
 ## API Reference
 
-### Inference API (Port 8080)
+### Inference API (Port 6590)
 
 | Endpoint | Method | Description |
 |----------|--------|-------------|
@@ -133,7 +133,7 @@ swift test --filter NovaMLXCoreTests
 | `/chat` | GET | Built-in web chat UI |
 | `/health` | GET | Health check |
 
-### Admin API (Port 8081, Bearer auth when apiKeys configured)
+### Admin API (Port 6591, Bearer auth when apiKeys configured)
 
 | Group | Endpoints |
 |-------|-----------|
@@ -146,6 +146,7 @@ swift test --filter NovaMLXCoreTests
 | Benchmarking | `POST/GET /admin/api/bench/*` |
 | Perplexity | `POST/GET /admin/api/ppl/*` |
 | Device Info | `GET /admin/api/device-info` |
+| Log Level | `GET/PUT /admin/api/log-level` |
 | Grammar | `POST /admin/api/grammar/validate` |
 | Dashboard | `GET /admin/dashboard` |
 
@@ -245,6 +246,23 @@ NovaMLX automatically detects thinking/reasoning models at load time by inspecti
 
 **Web UI:** The built-in chat page shows a Neural Pulse animation during thinking (real-time token count, speed, ghost preview of latest tokens), collapsing to a "Thought for Xs · N words" badge after completion.
 
+### Logging
+
+All modules use `NovaMLXLog` (in NovaMLXUtils), which writes to both swift-log and a rotating log file at `~/.nova/novamlx.log`. The file keeps up to 5 rotated copies (`novamlx.log.1` through `.5`).
+
+**Runtime log level control** via admin API:
+```bash
+# View current level
+curl http://127.0.0.1:6591/admin/api/log-level -H "Authorization: Bearer $KEY"
+
+# Enable debug logging
+curl -X PUT http://127.0.0.1:6591/admin/api/log-level \
+  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" -d '{"level": 0}'
+```
+Levels: 0=debug, 1=info (default), 2=warning, 3=error.
+
+> Note: NovaMLXCore uses `os.Logger` directly (circular dependency prevents importing NovaMLXUtils).
+
 ### Security
 
 - API key authentication (Bearer token) on both ports