Skip to content

Commit 83fb668

Browse files
author
lucasliu
committed
release: v1.0.5
Prefix cache hardening + E2E test fixes. Prefix cache (6 commits): - async write + async eviction in SSDCacheStore (no more 100-500ms tail latency stalls after generation) - safetensors header-only reader replaces full-file scan at startup (eliminates multi-GB I/O at model init) - VLM streaming/non-streaming paths skip prefix cache fetch+store (was wasted SSD I/O — VLM never used the result) - pre-flight RotatingKVCache probe avoids loading SSD blocks for sliding-window models (Gemma family) that can't use them - ServerConfig.prefixCacheEnabled kill switch wired through to worker - TTFT benchmark gated on NOVAMLX_BENCH=1 E2E tests: - skip VLMs in text-only core API suite - accept reasoning-only output from Harmony (gpt-oss) models - bump smoke test max_tokens 50 -> 150 for thinking-channel budget
1 parent 0dd90b4 commit 83fb668

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

Sources/NovaMLXCore/Types.swift

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ import Logging
33

44
public enum NovaMLX {}
55

6-
public let version = "1.0.0"
6+
public let version = "1.0.5"
77

88
public var buildTimestamp: String {
99
guard let execURL = Bundle.main.executableURL,

0 commit comments

Comments
 (0)