Model Card: OpenMath

Model Details

Model Description

OpenMath is a fine-tuned small language model (SLM) specialized in solving math word problems with step-by-step reasoning. The model uses QLoRA (Quantized Low-Rank Adaptation) fine-tuning on the Qwen2.5-Math-1.5B base model.

Developed by: OpenMath Project Contributors
Model type: Causal Language Model (Math Reasoning)
Language: English
License: Apache License 2.0
Base Model: Qwen/Qwen2.5-Math-1.5B
Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
Parameters: 1.5B (base model) + LoRA adapters

Model Sources

Repository: OpenMath GitHub Repository
Base Model: Qwen/Qwen2.5-Math-1.5B

Uses

Direct Use

This model is designed for educational and research purposes to:

Solve grade-school level math word problems
Generate step-by-step mathematical reasoning
Demonstrate efficient fine-tuning techniques on limited compute resources

Downstream Use

The model can be used as a starting point for:

Further fine-tuning on additional math datasets
Integration into educational applications
Research on small language model capabilities in mathematical reasoning

Out-of-Scope Use

Production systems requiring high accuracy: The model achieves 41% accuracy and should not be used for critical applications
Advanced mathematics: The model is trained on grade-school level problems only
Homework/exam solving without verification: Always verify solutions independently
Professional mathematical advice or calculations

Bias, Risks, and Limitations

Known Limitations

Accuracy: 41% on GSM8K test subset (100 samples) - the model produces incorrect answers in the majority of cases
Training data size: Only trained on 1,000 samples from GSM8K, limiting generalization
Repetition issues: May generate repetitive text during inference
Domain specificity: Limited to grade-school math problems similar to GSM8K
Incomplete reasoning: May produce incomplete or misleading step-by-step solutions

Recommendations

Users should:

Always verify model outputs independently
Not rely on this model for educational assessments or real-world decisions
Understand this is a research/educational project, not a production-ready system
Use appropriate repetition penalties and decoding strategies to improve output quality

Training Details

Training Data

Dataset: GSM8K (Grade School Math 8K)
Training samples: 1,000 samples from the GSM8K training set
Data format: Math word problems with step-by-step solutions

Training Procedure

Training Hyperparameters

Training regime: 4-bit QLoRA fine-tuning
Epochs: 6
Max sequence length: 1024 tokens
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, o_proj, k_proj, v_proj
Quantization: 4-bit NF4 with double quantization
Compute dtype: float16
Loss masking: Trained primarily on solution portions to improve reasoning

Hardware

GPU: NVIDIA T4 (free Google Colab tier)
Training time: Reproducible on free Colab resources

Software

Framework: PyTorch, Transformers, PEFT
Quantization: BitsAndBytes (4-bit)
Fine-tuning: LoRA/QLoRA

Evaluation

Testing Data & Metrics

Testing Data

Dataset: GSM8K test split
Evaluation samples: 100-question subset (for faster evaluation on Colab)

Metrics

Primary metric: Accuracy (exact match)
GSM8K Accuracy: 41.0% (on 100-sample test subset)

Results

Model	Parameters	GSM8K Accuracy (%)
LLaMA 2	13B	28.7
Gemma 2 (PT)	2B	23.9
Mistral (Base)	7B	36.5
LLaMA 3.2 Instruct (CoT)	1B	39.04
OpenMath (Qwen2.5-Math-1.5B + LoRA)	1.5B	41.0
Gemma 3 IT	1B	42.15
Zephyr-7b-gemma-v0.1	7B	45.56
Gemma	7B	46.4

Benchmark Comparison Graph

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#4CAF50', 'primaryTextColor':'#000', 'primaryBorderColor':'#2E7D32', 'lineColor':'#1976D2', 'secondaryColor':'#FFC107', 'tertiaryColor':'#fff'}}}%%
graph LR
    subgraph "GSM8K Accuracy Comparison (%)"
        A["Gemma 2 PT<br/>2B: 23.9%"] 
        B["ERNIE 4.5<br/>21B: 25.2%"]
        C["Baichuan<br/>13B: 26.6%"]
        D["LLaMA 2<br/>13B: 28.7%"]
        E["Qwen 3 IT<br/>1.7B: 33.66%"]
        F["Mistral<br/>7B: 36.5%"]
        G["LLaMA 3.2 IT<br/>1B: 39.04%"]
        H["OpenMath<br/>1.5B: 41.0%"]
        I["Gemma 3 IT<br/>1B: 42.15%"]
        J["Zephyr-7b<br/>7B: 45.56%"]
        K["Gemma<br/>7B: 46.4%"]
    end
    
    style H fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff

Performance Visualization:

Gemma 2 PT (2B)         ████████████ 23.9%
ERNIE 4.5 (21B)         █████████████ 25.2%
Baichuan (13B)          ██████████████ 26.6%
LLaMA 2 (13B)           ███████████████ 28.7%
Qwen 3 IT (1.7B)        █████████████████ 33.66%
Mistral (7B)            ███████████████████ 36.5%
LLaMA 3.2 IT (1B)       ████████████████████ 39.04%
OpenMath (1.5B)         █████████████████████ 41.0% ⭐
Gemma 3 IT (1B)         █████████████████████ 42.15%
Zephyr-7b (7B)          ███████████████████████ 45.56%
Gemma (7B)              ████████████████████████ 46.4%
                        |----|----|----|----|----|----|
                        0   10   20   30   40   50

OpenMath achieves competitive performance compared to other small language models while being trained on only 1,000 samples and reproducible on free Colab resources.

Technical Specifications

Model Architecture

Base architecture: Qwen2.5-Math-1.5B (Transformer-based causal LM)
Adapter type: LoRA (Low-Rank Adaptation)
Quantization: 4-bit NF4 quantization

Compute Infrastructure

Training: Google Colab (T4 GPU, free tier)
Inference: Compatible with T4 GPU or similar (requires ~6-8GB VRAM with 4-bit quantization)

Input Format

The model expects prompts in the following format:

### Instruction:
Solve the math problem step by step and give the final answer.

### Problem:
[Your math problem here]

### Solution:

Generation Parameters

Recommended inference settings:

max_new_tokens: 200
do_sample: False (deterministic for math)
repetition_penalty: 1.1
no_repeat_ngram_size: 3

Environmental Impact

Hardware Type: NVIDIA T4 GPU
Hours used: Minimal (reproducible on free Colab)
Cloud Provider: Google Colab
Carbon Emitted: Minimal due to efficient QLoRA training on limited samples

Citation

@software{openmath2024,
  title={OpenMath: Fine-tuning Small Language Models for Math Reasoning},
  author={OpenMath Contributors},
  year={2024},
  license={Apache-2.0}
}

Model Card Authors

OpenMath Project Contributors

Model Card Contact

[Repository Issues Page]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Card: OpenMath

Model Details

Model Description

Model Sources

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Known Limitations

Recommendations

Training Details

Training Data

Training Procedure

Training Hyperparameters

Hardware

Software

Evaluation

Testing Data & Metrics

Testing Data

Metrics

Results

Benchmark Comparison Graph

Technical Specifications

Model Architecture

Compute Infrastructure

Input Format

Generation Parameters

Environmental Impact

Citation

Model Card Authors

Model Card Contact

FilesExpand file tree

MODEL_CARD.md

Latest commit

History

MODEL_CARD.md

File metadata and controls

Model Card: OpenMath

Model Details

Model Description

Model Sources

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Known Limitations

Recommendations

Training Details

Training Data

Training Procedure

Training Hyperparameters

Hardware

Software

Evaluation

Testing Data & Metrics

Testing Data

Metrics

Results

Benchmark Comparison Graph

Technical Specifications

Model Architecture

Compute Infrastructure

Input Format

Generation Parameters

Environmental Impact

Citation

Model Card Authors

Model Card Contact