adding Support for Dream Architecture by Goekdeniz-Guelmez · Pull Request #270 · ml-explore/mlx-lm

Goekdeniz-Guelmez · 2025-07-03T08:25:49Z

No description provided.

Goekdeniz-Guelmez · 2025-07-03T09:10:13Z

python -m mlx_lm.generate --model /Users/gokdenizgulmez/Desktop/dream_grpo-4bit --prompt "write quick sort in c++"
<frozen runpy>:128: RuntimeWarning: 'mlx_lm.generate' found in sys.modules after import of package 'mlx_lm', but prior to execution of 'mlx_lm.generate'; this may result in unpredictable behaviour
Calling `python -m mlx_lm.generate...` directly is deprecated. Use `mlx_lm.generate...` or `python -m mlx_lm generate ...` instead.
==========
#include <iostream>
#include <vector>

using namespace std;

// Function to perform quick sort on a vector of integers
void quickSort(vector<int>& arr, int low, int high) {
    int pi = partition(arr, low, high);
    // Recursively sort elements before and after partition
    quickSort(arr, low, pi - 1);
    quickSort(arr, pi + 1, high);
}

// Function to partition the array
int partition(vector<int>& arr,
==========
Prompt: 25 tokens, 46.264 tokens-per-sec
Generation: 100 tokens, 12.091 tokens-per-sec
Peak memory: 4.365 GB

Goekdeniz-Guelmez · 2025-07-03T09:17:24Z

python -m mlx_lm.lora \
--model /Users/gokdenizgulmez/Desktop/dream_grpo-4bit \
--train \
--data /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/MLX/data_smoll \
--fine-tune-type lora \
--num-layers 2 \
--batch-size 1 \
--iters 5 \
--val-batches 1 \
--steps-per-report 1 \
--steps-per-eval 5 \
--adapter-path /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/MLX/test_dream \
--save-every 500 \
--max-seq-length 128 \
--grad-checkpoint
Calling `python -m mlx_lm.lora...` directly is deprecated. Use `mlx_lm.lora...` or `python -m mlx_lm lora ...` instead.
Loading pretrained model
The repository for /Users/gokdenizgulmez/Desktop/dream_grpo-4bit contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//Users/gokdenizgulmez/Desktop/dream_grpo-4bit.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
Loading datasets
Training
Trainable parameters: 0.002% (0.180M/7615.617M)
Starting training..., iters: 5
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 1263 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...:   0%|                                                                              | 0/1 [00:00<?, ?it/s][WARNING] Some sequences are longer than 128 tokens. The longest sentence 2047 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.91s/it]
Iter 1: Val loss 5.289, Val took 1.937s
Iter 1: Train loss 6.134, Learning Rate 1.000e-05, It/sec 0.173, Tokens/sec 21.929, Trained Tokens 127, Peak mem 4.569 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 605 will be truncated to 128. Consider pre-splitting your data to save memory.
Iter 2: Train loss 5.653, Learning Rate 1.000e-05, It/sec 0.471, Tokens/sec 59.792, Trained Tokens 254, Peak mem 4.670 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 2035 will be truncated to 128. Consider pre-splitting your data to save memory.
Iter 3: Train loss 5.062, Learning Rate 1.000e-05, It/sec 0.486, Tokens/sec 61.665, Trained Tokens 381, Peak mem 4.670 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 1806 will be truncated to 128. Consider pre-splitting your data to save memory.
Iter 4: Train loss 5.426, Learning Rate 1.000e-05, It/sec 0.482, Tokens/sec 61.209, Trained Tokens 508, Peak mem 4.670 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 1607 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...:   0%|                                                                              | 0/1 [00:00<?, ?it/s][WARNING] Some sequences are longer than 128 tokens. The longest sentence 1171 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.77s/it]
Iter 5: Val loss 5.226, Val took 1.789s
Iter 5: Train loss 4.596, Learning Rate 1.000e-05, It/sec 0.486, Tokens/sec 61.702, Trained Tokens 635, Peak mem 4.670 GB
Saved final weights to /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/test_dream/adapters.safetensors.

awni · 2025-07-03T13:10:22Z

This is a diffusion model right? I'm not sure it makes sense to do auto-regressive decoding with it?

Goekdeniz-Guelmez · 2025-07-03T13:17:15Z

Yes, I started to implement the diffusion generation, that was just to test the model implementation, or should it be better to wait until Llada has been merged to continue?

awni · 2025-07-03T13:20:12Z

Yes, I started to implement the diffusion generation, that was just to test the model implementation, or should it be better to wait until Llada has been merged to continue?

There's no need to wait for that. My recommendation though for diffusion models is to write a new generate_step (and possibly stream_generate / generate. They are so different that I think we should have a separate path entirely to avoid cluttering the code and to keep them easy to change as the diffusion model APIs kind of converge. The models can of course still go in models/.

Goekdeniz-Guelmez · 2025-07-03T13:29:38Z

yes thats what I thought too, maybe even a different terminal command like mlx_lm.generate.diffusion ..., what is your thought?

awni · 2025-07-03T14:15:31Z

yes thats what I thought too, maybe even a different terminal command like mlx_lm.generate.diffusion ..., what is your thought?

Yes possibly. How about start with the API and we can see what makes sense for the CLI base on what it looks like?

Goekdeniz-Guelmez · 2025-07-03T19:05:45Z

Sure, that's a good plan, once I get this running, I'll ping you.

ccckblaze · 2025-08-18T02:53:27Z

any updates?

Goekdeniz-Guelmez · 2025-08-18T18:46:23Z

Not at the moment. Diffusion-based text-to-text is generally slower on Apple Silicon compared to token-wise autoregression, and I haven’t seen adoption of this model on other platforms yet. A separate port might be more suitable in my opinion. That said, if there’s genuine interest, I’m happy to continue and prioritize it. What do you think, @angeloskath @awni?

awni · 2025-08-18T18:56:37Z

I agree so far there isn't a ton of interest for that specific model. I think it's fine to deprioritize this for now. I do think we should keep an eye on diffusion LLMs in general. As they improve it may make more sense to support them (either here or elsewhere), but we haven't reached that point yet and I haven't seen a ton of progress recently.

Goekdeniz-Guelmez · 2025-08-18T19:06:49Z

I agree so far there isn't a ton of interest for that specific model. I think it's fine to deprioritize this for now. I do think we should keep an eye on diffusion LLMs in general. As they improve it may make more sense to support them (either here or elsewhere), but we haven't reached that point yet and I haven't seen a ton of progress recently.

Agreed! I’ll leave the PR open but move the implementation into here, so it’s easier to maintain and revisit once they gain more traction. That way, we don’t lose the work, and we’re ready if adoption picks up.

init. com.

f067a0a

Goekdeniz-Guelmez marked this pull request as draft July 3, 2025 08:26

Goekdeniz-Guelmez and others added 5 commits July 3, 2025 10:53

updates

5739214

update ackn.

07e44c3

Merge branch 'main' into adding-DiffuCoder

1fce9f3

fixes

0070066

fixes

7c8d1ec

adding to training

c2d30e8

Goekdeniz-Guelmez changed the title ~~adding Support for Apples DiffuCoder Dream Architecture~~ adding Support for Dream Architecture Jul 3, 2025

amking it trainable

b725b02

Goekdeniz-Guelmez marked this pull request as ready for review July 3, 2025 09:17

Goekdeniz-Guelmez added 3 commits July 3, 2025 11:17

removing torch version

6413c67

format

c7cffcd

starting to add the generation func

9955ad6

custom inference scripts

12db409

Adolfo-GM approved these changes Jul 3, 2025

View reviewed changes

Goekdeniz-Guelmez and others added 3 commits July 4, 2025 16:20

update generate.py

6980ea0

adding into generate

f9a718d

fix diffusion

701b429

gaarutyunov mentioned this pull request Jul 6, 2025

Write blog post about mlx-rs gaarutyunov/blog#9

Open

Goekdeniz-Guelmez marked this pull request as draft July 26, 2025 08:01

akkikiki mentioned this pull request Jan 24, 2026

feat: Add LLaDA 1.0/2.0 Diffusion Model Support #804

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding Support for Dream Architecture#270

adding Support for Dream Architecture#270
Goekdeniz-Guelmez wants to merge 15 commits intoml-explore:mainfrom
Goekdeniz-Guelmez:adding-DiffuCoder

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

awni commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

awni commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

awni commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

ccckblaze commented Aug 18, 2025

Uh oh!

Goekdeniz-Guelmez commented Aug 18, 2025

Uh oh!

awni commented Aug 18, 2025

Uh oh!

Goekdeniz-Guelmez commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

awni commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

awni commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

awni commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

ccckblaze commented Aug 18, 2025

Uh oh!

Goekdeniz-Guelmez commented Aug 18, 2025

Uh oh!

awni commented Aug 18, 2025

Uh oh!

Goekdeniz-Guelmez commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants