Skip to content

Latest commit

 

History

History
101 lines (80 loc) · 4.34 KB

File metadata and controls

101 lines (80 loc) · 4.34 KB

This tutorial helps users deploy and run large language models locally on Apple Silicon Macs using the native MLX-LM framework.

🔧 Environment Setup

# Create Conda virtual environment
conda create -n mlx-lm python=3.11
conda activate mlx-lm
# Install dependencies
pip install -r requirements.txt

📁 Project Structure

models_mlx/
├── run_app_gradio.py          # Gradio interactive app (download + chat)
├── requirements.txt           # Python dependencies
├── configs/                   # Model configs (JSON, hot-reloadable)
│   └── model_info/
│       ├── mlx.json           #   MLX quantized model list
│       └── original.json      #   Original HuggingFace model list
├── modules/                   # Functional modules
│   └── download_model.py      #   Model download module (standalone runnable)
├── models/                    # Downloaded model storage
├── notebooks/                 # Jupyter Notebook tutorials
│   ├── Qwen3_MLX_部署与交互.ipynb
│   └── Qwen3_Transformers_部署与交互.ipynb
└── docs/                      # Documentation
    └── MLX-LM_Intro.md        #   MLX framework introduction

📖 Tutorials

Theory

MLX Framework Introduction

Notebook Tutorials

Notebook Description
Qwen3_MLX_部署与交互 Deploy Qwen3 with MLX (recommended for Apple Silicon)
Qwen3_Transformers_部署与交互 Deploy Qwen3 with Transformers (universal compatibility)

Gradio Interactive App

An all-in-one web interface for model downloading and chatting:

  • Model Download: Three-level selection (Company → Series → Model) with local existence detection
  • Model Chat: Supports both MLX and Transformers backends; MLX supports streaming output
  • Parameter Tuning: Temperature, Top-p, Max Tokens, Thinking mode
  • Hot Reload: Edit JSON configs in configs/, then refresh the page or click the refresh button
python run_app_gradio.py

CLI Model Download

You can also download models via an interactive command-line interface (no Gradio needed):

python -m modules.download_model

🚀 Supported Models

Model lists are configured via JSON files in the configs/ directory.

Company Series Models
Alibaba QwQ QwQ-0.5B-4bit
Alibaba Qwen1.5 Qwen1.5-0.5B-Chat-4bit
Qwen1.5-1.8B-Chat-4bit
Qwen1.5-MoE-A2.7B-4bit
Qwen1.5-MoE-A2.7B-Chat-4bit
Alibaba Qwen2 Qwen2-0.5B-Instruct-4bit
Qwen2-1.5B-4bit
Qwen2-1.5B-Instruct-4bit
Alibaba Qwen2-Math Qwen2-Math-1.5B-Instruct-4bit
Alibaba Qwen2.5 Qwen2.5-0.5B-4bit
Qwen2.5-0.5B-Instruct-4bit
Qwen2.5-1.5B-4bit
Qwen2.5-1.5B-Instruct-4bit
Qwen2.5-3B-4bit
Qwen2.5-3B-Instruct-4bit
Alibaba Qwen2.5-Coder Qwen2.5-Coder-0.5B-4bit
Qwen2.5-Coder-0.5B-Instruct-4bit
Qwen2.5-Coder-1.5B-4bit
Qwen2.5-Coder-1.5B-Instruct-4bit
Qwen2.5-Coder-3B-4bit
Qwen2.5-Coder-3B-Instruct-4bit
Alibaba Qwen2.5-Math Qwen2.5-Math-1.5B-4bit
Qwen2.5-Math-1.5B-Instruct-4bit
Alibaba Qwen3 Qwen3-0.6B-4bit
Qwen3-0.6B-Base-4bit
Qwen3-1.7B-4bit
Alibaba Qwen3.5 Qwen3.5-0.8B-4bit
Qwen3.5-2B-4bit
DeepSeek DeepSeek-R1 DeepSeek-R1-Distill-Qwen-1.5B-4bit
DeepSeek DeepSeek-V3 -
Google Gemma-2 gemma-2-2b-4bit
gemma-2-2b-it-4bit
gemma-2-2b-jpn-it-4bit
gemma-2-baku-2b-it-4bit
Google Gemma-3 gemma-3-1b-it-4bit
gemma-3-1b-pt-4bit
gemma-3-270m-4bit
gemma-3-270m-it-4bit
Meta Llama-3.1 -
Meta Llama-3.2 Llama-3.2-1B-Instruct-4bit
Llama-3.2-3B-Instruct-4bit
Meta Llama-4 -
Microsoft Phi-2 phi-2-super-4bit
Microsoft Phi-4 -
Mistral Mistral Ministral-3-3B-Instruct-2512-4bit
Ministral-3-3B-Reasoning-2512-4bit
Moonshot Kimi -

To add new models, simply edit configs/model_info/mlx.json or configs/model_info/original.json.