Skip to content

Latest commit

 

History

History
225 lines (163 loc) · 7.34 KB

File metadata and controls

225 lines (163 loc) · 7.34 KB

Model Card: OpenMath

Model Details

Model Description

OpenMath is a fine-tuned small language model (SLM) specialized in solving math word problems with step-by-step reasoning. The model uses QLoRA (Quantized Low-Rank Adaptation) fine-tuning on the Qwen2.5-Math-1.5B base model.

  • Developed by: OpenMath Project Contributors
  • Model type: Causal Language Model (Math Reasoning)
  • Language: English
  • License: Apache License 2.0
  • Base Model: Qwen/Qwen2.5-Math-1.5B
  • Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
  • Parameters: 1.5B (base model) + LoRA adapters

Model Sources

Uses

Direct Use

This model is designed for educational and research purposes to:

  • Solve grade-school level math word problems
  • Generate step-by-step mathematical reasoning
  • Demonstrate efficient fine-tuning techniques on limited compute resources

Downstream Use

The model can be used as a starting point for:

  • Further fine-tuning on additional math datasets
  • Integration into educational applications
  • Research on small language model capabilities in mathematical reasoning

Out-of-Scope Use

  • Production systems requiring high accuracy: The model achieves 41% accuracy and should not be used for critical applications
  • Advanced mathematics: The model is trained on grade-school level problems only
  • Homework/exam solving without verification: Always verify solutions independently
  • Professional mathematical advice or calculations

Bias, Risks, and Limitations

Known Limitations

  • Accuracy: 41% on GSM8K test subset (100 samples) - the model produces incorrect answers in the majority of cases
  • Training data size: Only trained on 1,000 samples from GSM8K, limiting generalization
  • Repetition issues: May generate repetitive text during inference
  • Domain specificity: Limited to grade-school math problems similar to GSM8K
  • Incomplete reasoning: May produce incomplete or misleading step-by-step solutions

Recommendations

Users should:

  • Always verify model outputs independently
  • Not rely on this model for educational assessments or real-world decisions
  • Understand this is a research/educational project, not a production-ready system
  • Use appropriate repetition penalties and decoding strategies to improve output quality

Training Details

Training Data

  • Dataset: GSM8K (Grade School Math 8K)
  • Training samples: 1,000 samples from the GSM8K training set
  • Data format: Math word problems with step-by-step solutions

Training Procedure

Training Hyperparameters

  • Training regime: 4-bit QLoRA fine-tuning
  • Epochs: 6
  • Max sequence length: 1024 tokens
  • LoRA rank (r): 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Target modules: q_proj, o_proj, k_proj, v_proj
  • Quantization: 4-bit NF4 with double quantization
  • Compute dtype: float16
  • Loss masking: Trained primarily on solution portions to improve reasoning

Hardware

  • GPU: NVIDIA T4 (free Google Colab tier)
  • Training time: Reproducible on free Colab resources

Software

  • Framework: PyTorch, Transformers, PEFT
  • Quantization: BitsAndBytes (4-bit)
  • Fine-tuning: LoRA/QLoRA

Evaluation

Testing Data & Metrics

Testing Data

  • Dataset: GSM8K test split
  • Evaluation samples: 100-question subset (for faster evaluation on Colab)

Metrics

  • Primary metric: Accuracy (exact match)
  • GSM8K Accuracy: 41.0% (on 100-sample test subset)

Results

Model Parameters GSM8K Accuracy (%)
LLaMA 2 13B 28.7
Gemma 2 (PT) 2B 23.9
Mistral (Base) 7B 36.5
LLaMA 3.2 Instruct (CoT) 1B 39.04
OpenMath (Qwen2.5-Math-1.5B + LoRA) 1.5B 41.0
Gemma 3 IT 1B 42.15
Zephyr-7b-gemma-v0.1 7B 45.56
Gemma 7B 46.4

Benchmark Comparison Graph

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#4CAF50', 'primaryTextColor':'#000', 'primaryBorderColor':'#2E7D32', 'lineColor':'#1976D2', 'secondaryColor':'#FFC107', 'tertiaryColor':'#fff'}}}%%
graph LR
    subgraph "GSM8K Accuracy Comparison (%)"
        A["Gemma 2 PT<br/>2B: 23.9%"] 
        B["ERNIE 4.5<br/>21B: 25.2%"]
        C["Baichuan<br/>13B: 26.6%"]
        D["LLaMA 2<br/>13B: 28.7%"]
        E["Qwen 3 IT<br/>1.7B: 33.66%"]
        F["Mistral<br/>7B: 36.5%"]
        G["LLaMA 3.2 IT<br/>1B: 39.04%"]
        H["OpenMath<br/>1.5B: 41.0%"]
        I["Gemma 3 IT<br/>1B: 42.15%"]
        J["Zephyr-7b<br/>7B: 45.56%"]
        K["Gemma<br/>7B: 46.4%"]
    end
    
    style H fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
Loading

Performance Visualization:

Gemma 2 PT (2B)         ████████████ 23.9%
ERNIE 4.5 (21B)         █████████████ 25.2%
Baichuan (13B)          ██████████████ 26.6%
LLaMA 2 (13B)           ███████████████ 28.7%
Qwen 3 IT (1.7B)        █████████████████ 33.66%
Mistral (7B)            ███████████████████ 36.5%
LLaMA 3.2 IT (1B)       ████████████████████ 39.04%
OpenMath (1.5B)         █████████████████████ 41.0% ⭐
Gemma 3 IT (1B)         █████████████████████ 42.15%
Zephyr-7b (7B)          ███████████████████████ 45.56%
Gemma (7B)              ████████████████████████ 46.4%
                        |----|----|----|----|----|----|
                        0   10   20   30   40   50

OpenMath achieves competitive performance compared to other small language models while being trained on only 1,000 samples and reproducible on free Colab resources.

Technical Specifications

Model Architecture

  • Base architecture: Qwen2.5-Math-1.5B (Transformer-based causal LM)
  • Adapter type: LoRA (Low-Rank Adaptation)
  • Quantization: 4-bit NF4 quantization

Compute Infrastructure

  • Training: Google Colab (T4 GPU, free tier)
  • Inference: Compatible with T4 GPU or similar (requires ~6-8GB VRAM with 4-bit quantization)

Input Format

The model expects prompts in the following format:

### Instruction:
Solve the math problem step by step and give the final answer.

### Problem:
[Your math problem here]

### Solution:

Generation Parameters

Recommended inference settings:

  • max_new_tokens: 200
  • do_sample: False (deterministic for math)
  • repetition_penalty: 1.1
  • no_repeat_ngram_size: 3

Environmental Impact

  • Hardware Type: NVIDIA T4 GPU
  • Hours used: Minimal (reproducible on free Colab)
  • Cloud Provider: Google Colab
  • Carbon Emitted: Minimal due to efficient QLoRA training on limited samples

Citation

@software{openmath2024,
  title={OpenMath: Fine-tuning Small Language Models for Math Reasoning},
  author={OpenMath Contributors},
  year={2024},
  license={Apache-2.0}
}

Model Card Authors

OpenMath Project Contributors

Model Card Contact

[Repository Issues Page]