Skip to content

[issue] Surprising Performance Drop When Using <think> Instead of <reasoning> as Custom Tags for Fine-tuning #3029

@l-besiege-l

Description

@l-besiege-l

Hello Unsloth team!

Please excuse this beginner question. I'm new to the world of fine-tuning, and your library has been a fantastic and accessible starting point for me. While experimenting, I've encountered some model behavior that I don't understand and was hoping to get some clarification on what feels like a fundamental concept.

1. Did you update?

Yes, pip install --upgrade unsloth is up to date.

2. Colab or Kaggle or local / cloud

Local.

3. Number GPUs used

1x NVIDIA GeForce RTX 4090

4. Which notebook? Please link!

I only modified the custom tag in the official qwen3-4b-gpro example and removed some unnecessary output checks. Below is the link to the online notebook. https://colab.research.google.com/drive/1id4WqGn3yDZ4uOEmQI5HCR8UM1S64H07?usp=sharing

5. Which Unsloth version, TRL version, etc.?

Transformers: 4.53.2. vLLM: 0.9.2.
NVIDIA GeForce RTX 4090. Num GPUs = 2. Max memory: 23.514 GB. Platform: Linux.
Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0

6. Which trainer?

GRPOTrainer (but the same issue is observable with SFTTrainer).

Problem Description

I am trying to fine-tune the unsloth/Qwen3-8B-Base model for mathematical reasoning. My goal is to teach the model to first "think" about the problem and then provide a final answer, using a specific format.

I conducted an experiment with two scenarios. The only difference between them was the custom tags I used in my data formatting.

Scenario A: This works perfectly.
I used <reasoning> and <answer> as my custom tags. The model learns the format very well and generates responses that follow the assistant: <reasoning>...</reasoning><answer>...</answer> structure.

reasoning_start = "<reasoning>" 
reasoning_end   = "</reasoning>"   
solution_start  = "<answer>"
solution_end    = "</answer>"

system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
Place it between {reasoning_start} and {reasoning_end}.
Then, provide your solution between {solution_start}{solution_end}"""
Image 2 Image 3
Image 4 Image 5

Scenario B: This performs very poorly.
I changed the tags from <reasoning> to <think>. So the target format became assistant: <think>...</think><answer>...</answer>. To my surprise, the model completely fails to learn this format. The output is often incoherent, and it doesn't follow the desired structure at all.

reasoning_start = "<think>" 
reasoning_end   = "</think>"   
solution_start  = "<answer>"
solution_end    = "</answer>"

system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
Place it between {reasoning_start} and {reasoning_end}.
Then, provide your solution between {solution_start}{solution_end}"""
Image 6 Image 7
Image 8 Image 9

Is there something wrong with my code? How should I fix it? Thank you for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions