[Leaderboard Submission] Diagnostic-Reasoning-Q3X1 (Qwen3-8B, 8B params)

**Body (  score): 24.9% on MedXpertQA Text as lightweight 8B parameter AI model, higher than Deep Seek V3; Lama 70B and many other heavy weight AI models**
```
## Model Information
- **Model Name:** Diagnostic-Reasoning-Q3X1
- **Organization:** Clinical-Reasoning-Hub, UAEU College of Medicine
- **Model Size:** 8B parameters
- **Base Model:** Qwen3-8B fine-tuned with clinical reasoning methodology
- **HuggingFace:** https://huggingface.co/Clinical-Reasoning-Hub/Diagnostic-Reasoning-Q3X1
- **Contact:** adnangha@uaeu.ac.ae

## Results

[pentabrid_v9_full_report.json](https://github.com/user-attachments/files/25393574/pentabrid_v9_full_report.json)

## Evaluation Details
- **Method:** Zero-shot CoT (generation-based)
- **Max tokens:** 8,192
- **Temperature:** 0.0 (greedy)
- **Framework:** vLLM on NVIDIA H100 80GB
- **Inference mode:** BF16 full precision

## Key Highlight
Diagnostic-Reasoning-Q3X1 is an 8B parameter model that achieves competitive 
performance with models 8-84x larger on expert-level medical reasoning. 
To the best of our knowledge, this is the first sub-10B model submitted to MedXpertQA.

=========================================================
## PENTABRID V9 — COMPLETE EVALUATION REPORT
=========================================================

  GENERATION-BASED SCORES:
    MedQA (USMLE):                67.0%  (log-lik: 66.3%, +0.7pp)
    MedMCQA:                         58.9%  (log-lik: 58.6%, +0.3pp)
    PubMedQA:                         69.5%  (log-lik: 66.6%, +2.9pp)
    MMLU Clinical Knowledge:   85.3%  (log-lik: 86.4%, -1.1pp)

  EXPERT-LEVEL BENCHMARK:
    MedXpertQA Text:                24.9%  (3rd globally, 1st sub-10B)

  LOG-LIKELIHOOD OVERALL:      76.4% (7-benchmark average)

  LEADERBOARD HIGHLIGHTS:
    - Beats LLaMA-3.3-70B on MedXpertQA (8B vs 70B)
    - Beats DeepSeek-V3 on MedXpertQA (8B vs 671B)
    - Beats MedReason-8B on MedQA by +5.3pp
    - First sub-10B model on MedXpertQA leaderboard

## Submission File
[Attached: pentabrid_v9_full_report.json]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Leaderboard Submission] Diagnostic-Reasoning-Q3X1 (Qwen3-8B, 8B params) #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Leaderboard Submission] Diagnostic-Reasoning-Q3X1 (Qwen3-8B, 8B params) #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions