v1.1.0: Python SDK, audio/image generation, logging overhaul, Modelfiles #3

cnshsliu · 2026-05-10T14:07:33Z

cnshsliu
May 10, 2026
Maintainer

NovaMLX v1.1.0 is out. This release adds a Python SDK, local audio transcription and image generation, user-authored Modelfiles, and reworks the logging system. Download (DMG or tar.gz).

Python SDK — sdk/python/ ships a full client with type hints, streaming support, admin API, tool calling, and thinking model examples. Install via pip install -e sdk/python/.

Audio transcription & image generation — Two new endpoints running locally through MLX:

POST /v1/audio/transcriptions — Qwen3-ASR, supports WAV/MP3/M4A/FLAC
POST /v1/images/generations — SDXL-Turbo, ~2s per image on M-series

Modelfile system — Define model recipes with system prompts and sampling overrides in a single file. Modelfiles appear as models in /v1/models:

# Create a modelfile
curl -X POST http://localhost:6591/admin/modelfiles/create \n  -H "Authorization: Bearer $KEY" \n  -d '{"name":"math-tutor","from":"mlx-community/Qwen3-27B-4bit","system":"You are a math tutor.","parameters":{"temperature":0.3}}'

Logging overhaul — Log files now rotate (keeps last 5) instead of truncating. Runtime log level is adjustable without restart via admin API:

curl -X PUT http://localhost:6591/admin/api/log-level \n  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" -d '{"level": 0}'

Levels: 0=debug, 1=info (default), 2=warning, 3=error.

Other additions:

Per-request keep_alive — override model TTL per request
reasoning_effort parameter — OpenAI-standard thinking budget control
logprobs and top_logprobs support
Auto-load coordinator with SSE keep-alive for cold model loads
nova.capabilities exposed on /v1/models
Tokenhub integration (types + menu bar page)
E2E model test suite (Scripts/test-all-models.sh)
Comprehensive architecture doc (architecture.md)

Bug fixes:

Tool message mapping preserves tool_calls and tool_call_id (OpenAI + Anthropic)
Streaming prompt_tokens plumbed through to usage stats

ZayanKhan-12 · 2026-05-13T19:41:19Z

ZayanKhan-12
May 13, 2026

NovaMLX v1.1.0 Discussion Reply

Paste-Ready GitHub Reply

Great release — the Python SDK + local audio/image endpoints make this a much more complete local OpenAI-compatible stack.

A few engineering notes that might help tighten the v1.1.0 rollout/docs:

### 1. Clarify the default server port

The release examples use:

```bash
http://localhost:6591

for admin endpoints, but the README quick start says the server runs on:

localhost:8080

If both are valid depending on install mode/config, it may be worth adding one line like:

Default server URL: http://localhost:8080
Admin examples below assume NOVA_MLX_URL=http://localhost:6591 or a custom configured port.

That would prevent users from thinking the admin API or Modelfile API is broken when they are only hitting the wrong port.

2. Python SDK versioning may be confusing

The release is v1.1.0, but the Python SDK pyproject.toml appears to expose:

name = "novamlx"
version = "0.1.0"

If the SDK is intentionally versioned separately from the app/server, I would document that explicitly. Otherwise, aligning the SDK package version with the release tag would make debugging and issue reports easier:

python -c "import importlib.metadata as m; print(m.version('novamlx'))"

That way, a user can quickly report:

NovaMLX app: v1.1.0
Python SDK: v1.1.0
Server API: /v1

instead of mixing app and SDK versions.

3. Suggested SDK smoke-test section

Since sdk/python/ now ships the typed client, streaming, admin API, tool calling, and thinking examples, I would add a tiny smoke-test block to the release notes or SDK README:

cd sdk/python
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

python - <<'PY'
import novamlx
from novamlx import Client, AdminClient

print("novamlx import OK")
print("Client:", Client)
print("AdminClient:", AdminClient)
PY

Then a minimal chat smoke test:

python examples/basic_chat.py

and a streaming smoke test:

python examples/streaming.py

This gives users a fast way to separate:

SDK install problem
server not running
model not loaded
auth/config problem

4. Audio/image endpoint examples would be useful

The new local endpoints are one of the strongest parts of this release:

POST /v1/audio/transcriptions
POST /v1/images/generations

It would be helpful to include copy-paste curl examples for both.

For audio transcription:

curl http://localhost:8080/v1/audio/transcriptions \
  -H "Authorization: Bearer $KEY" \
  -F "file=@sample.wav" \
  -F "model=qwen3-asr"

For image generation:

curl http://localhost:8080/v1/images/generations \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sdxl-turbo",
    "prompt": "A clean product shot of a futuristic local AI server running on Apple Silicon",
    "size": "1024x1024",
    "n": 1
  }'

Even if the exact model names differ internally, having the expected request shape documented would make the endpoints much easier to test.

5. Modelfiles: document the resulting model name

The Modelfile system looks very useful. One thing I would make explicit is how the created Modelfile appears in /v1/models.

For example, after:

curl -X POST http://localhost:6591/admin/modelfiles/create \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "math-tutor",
    "from": "mlx-community/Qwen3-27B-4bit",
    "system": "You are a math tutor.",
    "parameters": {
      "temperature": 0.3
    }
  }'

can users immediately call it like this?

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "math-tutor",
    "messages": [
      {
        "role": "user",
        "content": "Explain eigenvalues in simple terms."
      }
    ]
  }'

If yes, showing that second request would make the feature much clearer.

6. Runtime log-level API is a strong operations feature

The runtime log-level endpoint is a nice ops improvement:

curl -X PUT http://localhost:6591/admin/api/log-level \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"level": 0}'

One useful addition would be a read-back endpoint or example, something like:

curl http://localhost:6591/admin/api/log-level \
  -H "Authorization: Bearer $KEY"

That would let operators confirm the current runtime level after changing it.

7. Security/admin API note

Because this release adds more admin surface area — Modelfiles, runtime log level, auto-load behavior, Tokenhub integration — I would add a short security note in the docs:

Do not expose the admin API directly to the public internet.
Bind locally or put it behind authentication/reverse proxy controls.
Rotate the admin key if it is leaked.

This matters especially because Modelfiles can influence model behavior, prompts, sampling parameters, and what appears in /v1/models.

Overall

This looks like a major release: SDK, audio, image generation, Modelfiles, log rotation, runtime log levels, reasoning_effort, logprobs, better tool-call mapping, and SSE keep-alive all move NovaMLX closer to a full local model platform rather than just a chat/completions server.

The main things I would polish are:

clarify the default port/config story
clarify Python SDK versioning
add SDK smoke tests
add copy-paste examples for audio/image endpoints
show the full Modelfile lifecycle from creation to /v1/chat/completions
add a small admin API security note

Nice work on the release.

---

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0: Python SDK, audio/image generation, logging overhaul, Modelfiles #3

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

v1.1.0: Python SDK, audio/image generation, logging overhaul, Modelfiles #3

Uh oh!

cnshsliu May 10, 2026 Maintainer

Replies: 1 comment

Uh oh!

Uh oh!

ZayanKhan-12 May 13, 2026

NovaMLX v1.1.0 Discussion Reply

Paste-Ready GitHub Reply

2. Python SDK versioning may be confusing

3. Suggested SDK smoke-test section

4. Audio/image endpoint examples would be useful

5. Modelfiles: document the resulting model name

6. Runtime log-level API is a strong operations feature

7. Security/admin API note

Overall

cnshsliu
May 10, 2026
Maintainer

ZayanKhan-12
May 13, 2026