Hello Unsloth team!
Please excuse this beginner question. I'm new to the world of fine-tuning, and your library has been a fantastic and accessible starting point for me. While experimenting, I've encountered some model behavior that I don't understand and was hoping to get some clarification on what feels like a fundamental concept.
1. Did you update?
Yes, pip install --upgrade unsloth is up to date.
2. Colab or Kaggle or local / cloud
Local.
3. Number GPUs used
1x NVIDIA GeForce RTX 4090
4. Which notebook? Please link!
I only modified the custom tag in the official qwen3-4b-gpro example and removed some unnecessary output checks. Below is the link to the online notebook. https://colab.research.google.com/drive/1id4WqGn3yDZ4uOEmQI5HCR8UM1S64H07?usp=sharing
5. Which Unsloth version, TRL version, etc.?
Transformers: 4.53.2. vLLM: 0.9.2.
NVIDIA GeForce RTX 4090. Num GPUs = 2. Max memory: 23.514 GB. Platform: Linux.
Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0
6. Which trainer?
GRPOTrainer (but the same issue is observable with SFTTrainer).
Problem Description
I am trying to fine-tune the unsloth/Qwen3-8B-Base model for mathematical reasoning. My goal is to teach the model to first "think" about the problem and then provide a final answer, using a specific format.
I conducted an experiment with two scenarios. The only difference between them was the custom tags I used in my data formatting.
Scenario A: This works perfectly.
I used <reasoning> and <answer> as my custom tags. The model learns the format very well and generates responses that follow the assistant: <reasoning>...</reasoning><answer>...</answer> structure.
reasoning_start = "<reasoning>"
reasoning_end = "</reasoning>"
solution_start = "<answer>"
solution_end = "</answer>"
system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
Place it between {reasoning_start} and {reasoning_end}.
Then, provide your solution between {solution_start}{solution_end}"""
Scenario B: This performs very poorly.
I changed the tags from <reasoning> to <think>. So the target format became assistant: <think>...</think><answer>...</answer>. To my surprise, the model completely fails to learn this format. The output is often incoherent, and it doesn't follow the desired structure at all.
reasoning_start = "<think>"
reasoning_end = "</think>"
solution_start = "<answer>"
solution_end = "</answer>"
system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
Place it between {reasoning_start} and {reasoning_end}.
Then, provide your solution between {solution_start}{solution_end}"""
Is there something wrong with my code? How should I fix it? Thank you for your time!
Hello Unsloth team!
Please excuse this beginner question. I'm new to the world of fine-tuning, and your library has been a fantastic and accessible starting point for me. While experimenting, I've encountered some model behavior that I don't understand and was hoping to get some clarification on what feels like a fundamental concept.
1. Did you update?
Yes,
pip install --upgrade unslothis up to date.2.
ColaborKaggleor local / cloudLocal.
3. Number GPUs used
1x NVIDIA GeForce RTX 4090
4. Which notebook? Please link!
I only modified the custom tag in the official qwen3-4b-gpro example and removed some unnecessary output checks. Below is the link to the online notebook. https://colab.research.google.com/drive/1id4WqGn3yDZ4uOEmQI5HCR8UM1S64H07?usp=sharing
5. Which Unsloth version, TRL version, etc.?
Transformers: 4.53.2. vLLM: 0.9.2.
NVIDIA GeForce RTX 4090. Num GPUs = 2. Max memory: 23.514 GB. Platform: Linux.
Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0
6. Which trainer?
GRPOTrainer(but the same issue is observable withSFTTrainer).Problem Description
I am trying to fine-tune the
unsloth/Qwen3-8B-Basemodel for mathematical reasoning. My goal is to teach the model to first "think" about the problem and then provide a final answer, using a specific format.I conducted an experiment with two scenarios. The only difference between them was the custom tags I used in my data formatting.
Scenario A: This works perfectly.
I used
<reasoning>and<answer>as my custom tags. The model learns the format very well and generates responses that follow theassistant: <reasoning>...</reasoning><answer>...</answer>structure.Scenario B: This performs very poorly.
I changed the tags from
<reasoning>to<think>. So the target format becameassistant: <think>...</think><answer>...</answer>. To my surprise, the model completely fails to learn this format. The output is often incoherent, and it doesn't follow the desired structure at all.Is there something wrong with my code? How should I fix it? Thank you for your time!