Skip to content

Commit 79cf833

Browse files
author
lucasliu
committed
feat: logging overhaul, tokenhub, Python SDK, docs update
Logging: - NovaMLXLog rewrite: LogLevel enum, file rotation (5 files), runtime level filter via admin API endpoint - Spam reduction: demote SSE/RunLoop noise to debug - Normalize all modules with [Module] prefix convention - AuthClient: replace per-call file I/O with os.Logger Features: - Tokenhub integration (types + menu bar page) - Cloud backend refactor, Gemma4 template whitespace fix - Chat page enhancements, model settings updates New: - Python SDK (sdk/python/) with examples and tests - E2E test scripts (test-all-models.sh, test-gemma4-e2e.sh) - Architecture doc, EXO_DNet technical report Docs: - CHANGELOG.md: v1.0.0 through v1.0.8 - DEVELOPMENT.md: fix ports 6590/6591, add logging section - features.md/zh-CN.md: fix ports, add audio/image/modelfile/keep_alive - AGENTS.md: add E2E test scripts, remove stale todo.markdown ref - .gitignore: add build artifacts, vendors, .grok patterns
1 parent 128de6f commit 79cf833

58 files changed

Lines changed: 6880 additions & 465 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ vendors/*/xcuserdata/
2020
*.xcworkspace/
2121
DerivedData/
2222
xcuserdata/
23+
*.d
24+
*.dia
25+
*.o
26+
*.swiftdeps
27+
*.emit-module.d
28+
*.emit-module.dia
2329

2430
# ==================== Python ====================
2531
__pycache__/
@@ -46,11 +52,17 @@ MEMORY.md
4652

4753
# ==================== 外部仓库(本地参考,不入库) ====================
4854
mlx-swift-lm/
55+
vendors/
56+
57+
# ==================== Grok 工具缓存 ====================
58+
.grok/
4959

5060
# ==================== 其他常见忽略 ====================
5161
.env
5262
.env.local
5363
.vscode/
5464
.idea/
5565
*.log
66+
*.tar.gz
5667
.aider*
68+
opencode.json

AGENTS.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,31 @@ swift test --filter <TestName> # single test or pattern
4242

4343
Tests live under `Tests/NovaMLX*Tests/`.
4444

45+
### E2E Model Tests
46+
47+
```bash
48+
# Test all downloaded LLM models (load → 4 API tests → unload)
49+
Scripts/test-all-models.sh
50+
```
51+
4552
## Logs & Config
4653

4754
- Runtime log: `~/.nova/novamlx.log`
4855
- Config: `~/.nova/config.json`
4956

57+
### Runtime Log Level
58+
59+
```bash
60+
# View current level
61+
curl http://127.0.0.1:6591/admin/api/log-level -H "Authorization: Bearer $KEY"
62+
63+
# Enable debug logging
64+
curl -X PUT http://127.0.0.1:6591/admin/api/log-level \
65+
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" -d '{"level": 0}'
66+
```
67+
Levels: 0=debug, 1=info, 2=warning, 3=error
68+
5069
## Project conventions
5170

5271
- Source: `Sources/NovaMLX{Core,Engine,Inference,API,Utils,MenuBar,ModelManager,...}/`
5372
- Vendored deps: `vendors/mlx-swift/`, `vendors/mlx-swift-lm/` (treat as read-only — modifications go via `Scripts/patch-*.py`)
54-
- Active diagnostic todos: `todo.markdown`

CHANGELOG.md

Lines changed: 101 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,109 @@
22

33
All notable changes to NovaMLX will be documented in this file.
44

5+
## [1.0.8] - 2026-05-08
6+
7+
### Added
8+
- Pre-emptive memory feasibility check in model list endpoint
9+
- Memory feasibility data per model in admin API
10+
11+
### Fixed
12+
- Prevent double-finish race in SSE keep-alive continuation
13+
- Scheduler concurrency regression tests
14+
15+
## [1.0.7] - 2026-05-07
16+
17+
### Fixed
18+
- Eliminate 4 concurrency races in FusedBatchScheduler
19+
- Add FinishGuard for atomic continuation lifecycle
20+
21+
## [1.0.6] - 2026-05-06
22+
23+
### Fixed
24+
- Preserve `tool_calls` and `tool_call_id` in OpenAI incoming message mapping
25+
- Preserve `tool_use`/`tool_result` blocks in Anthropic message mapping
26+
27+
## [1.0.5] - 2026-05-05
28+
29+
### Added
30+
- Prefix cache: async write + async eviction in SSDCacheStore
31+
- Prefix cache: safetensors header-only reader (replaces full-file scan)
32+
- Prefix cache: skip fetch/store for VLM paths
33+
- Prefix cache: pre-flight RotatingKVCache probe before SSD fetch
34+
- Prefix cache: repeated-prefix TTFT benchmark
35+
36+
### Fixed
37+
- E2E test: skip VLMs in core API suite, accept reasoning-only Harmony output
38+
39+
## [1.0.4] - 2026-05-04
40+
41+
### Added
42+
- Audio transcription (`/v1/audio/transcriptions`) — Qwen3-ASR (Swift/MLX)
43+
- Image generation (`/v1/images/generations`) — SDXL-Turbo
44+
- Modelfile system — user-authored model recipes with system prompt and sampling overrides
45+
- Per-request `keep_alive` — override model TTL per request
46+
- Harmony streaming protocol — GPT-OSS channel-aware format
47+
- ThinkingBudgetProcessor — per-request thinking token budget control
48+
- Strict-FSM JSON logit processor — structured output with JSON schema
49+
- Chat template library — three-level template resolution (user > registry > downloaded)
50+
- `isImplicitThinkingModel` rewrite — auto-detect implicit thinking models at load time
51+
- TokenMaskBuilder cache — pre-decoded vocabulary for fast masking
52+
- VLM LogitProcessor chain — thinking detection for vision-language models
53+
- DeepSeek-V4 lite regression test suite
54+
55+
### Fixed
56+
- ThinkingParser regression tests
57+
- Build script sync for worker binary
58+
59+
## [1.0.3] - 2026-05-02
60+
61+
### Added
62+
- Cloud auth gate with subscription validation
63+
- WebUI dashboard (SPA with status, models, chat pages)
64+
- CLI login/logout/account commands
65+
- GUI settings auth integration
66+
67+
### Changed
68+
- Tagline updated from "fastest" to "blazing fast"
69+
70+
## [1.0.2] - 2026-05-01
71+
72+
### Added
73+
- Homebrew tap distribution (`brew install novamlx`)
74+
- Full OpenAI and Anthropic tools/function calling support
75+
- Dynamic suggested searches from GitHub config
76+
- Generic control token filtering for streaming output
77+
- Agent-aware context scaling (ClientDetector)
78+
79+
### Fixed
80+
- Buffer partial control tokens in streaming to prevent leaked fragments
81+
- CI: patch mlx-swift-lm StrictConcurrency error
82+
- CI: recurse submodules when cloning mlx-swift
83+
84+
## [1.0.1] - 2026-04-30
85+
86+
### Added
87+
- Worker subprocess isolation for crash recovery
88+
- TurnStopProcessor for Qwen3.6 turn separator handling
89+
- ProcessMemoryEnforcer with soft/hard limits
90+
- OCROptimizer for OCR model parameter tuning
91+
- N-gram speculative decoding in FusedBatchScheduler
92+
- Draft-model speculative decoding (SpeculativeTokenIterator)
93+
- Full i18n system — 9 languages
94+
- Web chat with input history and parameter controls
95+
- Settings page with collapsible config.json editor
96+
- Cloud model support — remote inference proxy with streaming
97+
98+
### Changed
99+
- UI overhaul across all views
100+
101+
### Fixed
102+
- Streaming deadlock fix
103+
5104
## [1.0.0] - 2025-04-09
6105

7106
### Added
8-
- Pure Swift SPM project with 9 modular targets
107+
- Pure Swift SPM project with modular targets
9108
- Paged in-memory KV Cache with LRU eviction
10109
- SSD-backed KV Cache for persistent caching
11110
- Continuous batching for concurrent request processing
@@ -19,7 +118,7 @@ All notable changes to NovaMLX will be documented in this file.
19118
- Native macOS menu bar app with SwiftUI
20119
- System monitoring (CPU, memory, GPU)
21120
- Health check and stats endpoints
22-
- Comprehensive test suite (Core, KVCache, Engine, Inference, ModelManager, API)
121+
- Comprehensive test suite
23122
- MIT License
24123

25124
### Performance

DEVELOPMENT.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ swift test --filter NovaMLXCoreTests
110110

111111
## API Reference
112112

113-
### Inference API (Port 8080)
113+
### Inference API (Port 6590)
114114

115115
| Endpoint | Method | Description |
116116
|----------|--------|-------------|
@@ -133,7 +133,7 @@ swift test --filter NovaMLXCoreTests
133133
| `/chat` | GET | Built-in web chat UI |
134134
| `/health` | GET | Health check |
135135

136-
### Admin API (Port 8081, Bearer auth when apiKeys configured)
136+
### Admin API (Port 6591, Bearer auth when apiKeys configured)
137137

138138
| Group | Endpoints |
139139
|-------|-----------|
@@ -146,6 +146,7 @@ swift test --filter NovaMLXCoreTests
146146
| Benchmarking | `POST/GET /admin/api/bench/*` |
147147
| Perplexity | `POST/GET /admin/api/ppl/*` |
148148
| Device Info | `GET /admin/api/device-info` |
149+
| Log Level | `GET/PUT /admin/api/log-level` |
149150
| Grammar | `POST /admin/api/grammar/validate` |
150151
| Dashboard | `GET /admin/dashboard` |
151152

@@ -245,6 +246,23 @@ NovaMLX automatically detects thinking/reasoning models at load time by inspecti
245246

246247
**Web UI:** The built-in chat page shows a Neural Pulse animation during thinking (real-time token count, speed, ghost preview of latest tokens), collapsing to a "Thought for Xs · N words" badge after completion.
247248

249+
### Logging
250+
251+
All modules use `NovaMLXLog` (in NovaMLXUtils), which writes to both swift-log and a rotating log file at `~/.nova/novamlx.log`. The file keeps up to 5 rotated copies (`novamlx.log.1` through `.5`).
252+
253+
**Runtime log level control** via admin API:
254+
```bash
255+
# View current level
256+
curl http://127.0.0.1:6591/admin/api/log-level -H "Authorization: Bearer $KEY"
257+
258+
# Enable debug logging
259+
curl -X PUT http://127.0.0.1:6591/admin/api/log-level \
260+
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" -d '{"level": 0}'
261+
```
262+
Levels: 0=debug, 1=info (default), 2=warning, 3=error.
263+
264+
> Note: NovaMLXCore uses `os.Logger` directly (circular dependency prevents importing NovaMLXUtils).
265+
248266
### Security
249267

250268
- API key authentication (Bearer token) on both ports

0 commit comments

Comments
 (0)