Skip to content

braininahat/bonsai-claude

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bonsai-claude

PyPI Python License Downloads

Run Claude Code locally on Bonsai 8B 1-bitPrismML's 1-bit quantized Qwen3-8B — via Apple MLX. No Anthropic API key; no tokens leave your Mac.

Install

uv tool install bonsai-claude

Then:

bonsai-claude

(First run auto-downloads the 55 MB PrismML-fork MLX wheel + the Bonsai model weights from HuggingFace.)

Run ephemerally without installing:

uvx bonsai-claude

Requirements

  • Apple Silicon Mac (M1 or newer)
  • macOS 26+ (the prebuilt fork wheel is tagged macosx_26_0_arm64)
  • uv on PATH — install: curl -LsSf https://astral.sh/uv/install.sh | sh
  • claude CLI on PATH

Python 3.12 is managed by uv automatically.

How it works

Claude Code speaks the Anthropic API shape (POST /v1/messages). MLX's server only speaks the OpenAI shape. So ANTHROPIC_BASE_URL can't point directly at it — a translator sits between.

claude CLI ──POST /v1/messages──▶ anthropic_shim :11434 ──POST /v1/chat/completions──▶ mlx_lm.server :8080 ──▶ Bonsai
            (Anthropic shape)       (direct adapter)         (OpenAI shape)

The adapter is ported from ollama/anthropic/anthropic.go (MIT — attribution in NOTICE). It handles request/response translation and the streaming state machine — including the input_json_delta events for tool_calls that LiteLLM's chat→anthropic adapter fails to emit.

Usage

bonsai-claude                         # interactive: pick context + --bare, then launch
bonsai-claude --non-interactive       # skip prompts, use saved prefs or defaults
bonsai-claude --smoke                 # headless HTTP round-trip test, then exit
bonsai-claude --panes                 # also open iTerm2 windows: log tail + macmon
bonsai-claude <claude args passed through>

Per-project preferences (max_kv_size, --bare choice) are saved at ~/.mlx_claude/prefs.json keyed by CWD.

Why Bonsai + 1-bit?

Bonsai is an 8B-parameter model in ~1 GB of weights — a ~8× memory reduction vs fp16. It fits in system RAM on M1 Macs that normally can't serve 8B models. The PrismML fork of mlx adds the 1-bit quant kernels needed to run it; the wheel is pinned and auto-fetched.

Prefill rate: ~100-150 tok/s on M-series chips (1-bit saves memory bandwidth but not FLOPs, so prefill is compute-bound). Generation: faster. --bare strips Claude Code's default context to keep turn-1 fast.

Caveats

  • Tool-call quality: Bonsai scores ~65.7 on the Berkeley Function Calling Leaderboard. Good enough for most Claude Code flows but weaker than frontier models on complex tool orchestration.
  • Large-context slowness: turn-1 with full context can take minutes on 1-bit quant. Use --bare (the TUI's default) to shrink Claude Code's system prompt 10-20×.
  • Prefix KV cache is in-memory only: restart the stack, the cache resets. Turn 2+ within a session reuses automatically.

License

MIT. See LICENSE and NOTICE for attributions.

About

TUI launcher for Claude Code against local MLX models (mlx-lm / mlx-vlm via LiteLLM proxy, all uvx)

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages