OpenMath is a fine-tuned small language model (SLM) specialized in solving math word problems with step-by-step reasoning. The model uses QLoRA (Quantized Low-Rank Adaptation) fine-tuning on the Qwen2.5-Math-1.5B base model.
- Developed by: OpenMath Project Contributors
- Model type: Causal Language Model (Math Reasoning)
- Language: English
- License: Apache License 2.0
- Base Model: Qwen/Qwen2.5-Math-1.5B
- Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
- Parameters: 1.5B (base model) + LoRA adapters
- Repository: OpenMath GitHub Repository
- Base Model: Qwen/Qwen2.5-Math-1.5B
This model is designed for educational and research purposes to:
- Solve grade-school level math word problems
- Generate step-by-step mathematical reasoning
- Demonstrate efficient fine-tuning techniques on limited compute resources
The model can be used as a starting point for:
- Further fine-tuning on additional math datasets
- Integration into educational applications
- Research on small language model capabilities in mathematical reasoning
- Production systems requiring high accuracy: The model achieves 41% accuracy and should not be used for critical applications
- Advanced mathematics: The model is trained on grade-school level problems only
- Homework/exam solving without verification: Always verify solutions independently
- Professional mathematical advice or calculations
- Accuracy: 41% on GSM8K test subset (100 samples) - the model produces incorrect answers in the majority of cases
- Training data size: Only trained on 1,000 samples from GSM8K, limiting generalization
- Repetition issues: May generate repetitive text during inference
- Domain specificity: Limited to grade-school math problems similar to GSM8K
- Incomplete reasoning: May produce incomplete or misleading step-by-step solutions
Users should:
- Always verify model outputs independently
- Not rely on this model for educational assessments or real-world decisions
- Understand this is a research/educational project, not a production-ready system
- Use appropriate repetition penalties and decoding strategies to improve output quality
- Dataset: GSM8K (Grade School Math 8K)
- Training samples: 1,000 samples from the GSM8K training set
- Data format: Math word problems with step-by-step solutions
- Training regime: 4-bit QLoRA fine-tuning
- Epochs: 6
- Max sequence length: 1024 tokens
- LoRA rank (r): 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules: q_proj, o_proj, k_proj, v_proj
- Quantization: 4-bit NF4 with double quantization
- Compute dtype: float16
- Loss masking: Trained primarily on solution portions to improve reasoning
- GPU: NVIDIA T4 (free Google Colab tier)
- Training time: Reproducible on free Colab resources
- Framework: PyTorch, Transformers, PEFT
- Quantization: BitsAndBytes (4-bit)
- Fine-tuning: LoRA/QLoRA
- Dataset: GSM8K test split
- Evaluation samples: 100-question subset (for faster evaluation on Colab)
- Primary metric: Accuracy (exact match)
- GSM8K Accuracy: 41.0% (on 100-sample test subset)
| Model | Parameters | GSM8K Accuracy (%) |
|---|---|---|
| LLaMA 2 | 13B | 28.7 |
| Gemma 2 (PT) | 2B | 23.9 |
| Mistral (Base) | 7B | 36.5 |
| LLaMA 3.2 Instruct (CoT) | 1B | 39.04 |
| OpenMath (Qwen2.5-Math-1.5B + LoRA) | 1.5B | 41.0 |
| Gemma 3 IT | 1B | 42.15 |
| Zephyr-7b-gemma-v0.1 | 7B | 45.56 |
| Gemma | 7B | 46.4 |
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#4CAF50', 'primaryTextColor':'#000', 'primaryBorderColor':'#2E7D32', 'lineColor':'#1976D2', 'secondaryColor':'#FFC107', 'tertiaryColor':'#fff'}}}%%
graph LR
subgraph "GSM8K Accuracy Comparison (%)"
A["Gemma 2 PT<br/>2B: 23.9%"]
B["ERNIE 4.5<br/>21B: 25.2%"]
C["Baichuan<br/>13B: 26.6%"]
D["LLaMA 2<br/>13B: 28.7%"]
E["Qwen 3 IT<br/>1.7B: 33.66%"]
F["Mistral<br/>7B: 36.5%"]
G["LLaMA 3.2 IT<br/>1B: 39.04%"]
H["OpenMath<br/>1.5B: 41.0%"]
I["Gemma 3 IT<br/>1B: 42.15%"]
J["Zephyr-7b<br/>7B: 45.56%"]
K["Gemma<br/>7B: 46.4%"]
end
style H fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
Performance Visualization:
Gemma 2 PT (2B) ████████████ 23.9%
ERNIE 4.5 (21B) █████████████ 25.2%
Baichuan (13B) ██████████████ 26.6%
LLaMA 2 (13B) ███████████████ 28.7%
Qwen 3 IT (1.7B) █████████████████ 33.66%
Mistral (7B) ███████████████████ 36.5%
LLaMA 3.2 IT (1B) ████████████████████ 39.04%
OpenMath (1.5B) █████████████████████ 41.0% ⭐
Gemma 3 IT (1B) █████████████████████ 42.15%
Zephyr-7b (7B) ███████████████████████ 45.56%
Gemma (7B) ████████████████████████ 46.4%
|----|----|----|----|----|----|
0 10 20 30 40 50
OpenMath achieves competitive performance compared to other small language models while being trained on only 1,000 samples and reproducible on free Colab resources.
- Base architecture: Qwen2.5-Math-1.5B (Transformer-based causal LM)
- Adapter type: LoRA (Low-Rank Adaptation)
- Quantization: 4-bit NF4 quantization
- Training: Google Colab (T4 GPU, free tier)
- Inference: Compatible with T4 GPU or similar (requires ~6-8GB VRAM with 4-bit quantization)
The model expects prompts in the following format:
### Instruction:
Solve the math problem step by step and give the final answer.
### Problem:
[Your math problem here]
### Solution:
Recommended inference settings:
max_new_tokens: 200do_sample: False (deterministic for math)repetition_penalty: 1.1no_repeat_ngram_size: 3
- Hardware Type: NVIDIA T4 GPU
- Hours used: Minimal (reproducible on free Colab)
- Cloud Provider: Google Colab
- Carbon Emitted: Minimal due to efficient QLoRA training on limited samples
@software{openmath2024,
title={OpenMath: Fine-tuning Small Language Models for Math Reasoning},
author={OpenMath Contributors},
year={2024},
license={Apache-2.0}
}OpenMath Project Contributors
[Repository Issues Page]