You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
python -m mlx_lm.generate --model /Users/gokdenizgulmez/Desktop/dream_grpo-4bit --prompt "write quick sort in c++"
<frozen runpy>:128: RuntimeWarning: 'mlx_lm.generate' found in sys.modules after import of package 'mlx_lm', but prior to execution of 'mlx_lm.generate'; this may result in unpredictable behaviour
Calling `python -m mlx_lm.generate...` directly is deprecated. Use `mlx_lm.generate...` or `python -m mlx_lm generate ...` instead.
==========
#include <iostream>
#include <vector>
using namespace std;
// Function to perform quick sort on a vector of integers
void quickSort(vector<int>& arr, int low, int high) {
int pi = partition(arr, low, high);
// Recursively sort elements before and after partition
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
// Function to partition the array
int partition(vector<int>& arr,
==========
Prompt: 25 tokens, 46.264 tokens-per-sec
Generation: 100 tokens, 12.091 tokens-per-sec
Peak memory: 4.365 GB
python -m mlx_lm.lora \
--model /Users/gokdenizgulmez/Desktop/dream_grpo-4bit \
--train \
--data /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/MLX/data_smoll \
--fine-tune-type lora \
--num-layers 2 \
--batch-size 1 \
--iters 5 \
--val-batches 1 \
--steps-per-report 1 \
--steps-per-eval 5 \
--adapter-path /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/MLX/test_dream \
--save-every 500 \
--max-seq-length 128 \
--grad-checkpoint
Calling `python -m mlx_lm.lora...` directly is deprecated. Use `mlx_lm.lora...` or `python -m mlx_lm lora ...` instead.
Loading pretrained model
The repository for /Users/gokdenizgulmez/Desktop/dream_grpo-4bit contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//Users/gokdenizgulmez/Desktop/dream_grpo-4bit.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] y
Loading datasets
Training
Trainable parameters: 0.002% (0.180M/7615.617M)
Starting training..., iters: 5
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 1263 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...: 0%| | 0/1 [00:00<?, ?it/s][WARNING] Some sequences are longer than 128 tokens. The longest sentence 2047 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.91s/it]
Iter 1: Val loss 5.289, Val took 1.937s
Iter 1: Train loss 6.134, Learning Rate 1.000e-05, It/sec 0.173, Tokens/sec 21.929, Trained Tokens 127, Peak mem 4.569 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 605 will be truncated to 128. Consider pre-splitting your data to save memory.
Iter 2: Train loss 5.653, Learning Rate 1.000e-05, It/sec 0.471, Tokens/sec 59.792, Trained Tokens 254, Peak mem 4.670 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 2035 will be truncated to 128. Consider pre-splitting your data to save memory.
Iter 3: Train loss 5.062, Learning Rate 1.000e-05, It/sec 0.486, Tokens/sec 61.665, Trained Tokens 381, Peak mem 4.670 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 1806 will be truncated to 128. Consider pre-splitting your data to save memory.
Iter 4: Train loss 5.426, Learning Rate 1.000e-05, It/sec 0.482, Tokens/sec 61.209, Trained Tokens 508, Peak mem 4.670 GB
[WARNING] Some sequences are longer than 128 tokens. The longest sentence 1607 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...: 0%| | 0/1 [00:00<?, ?it/s][WARNING] Some sequences are longer than 128 tokens. The longest sentence 1171 will be truncated to 128. Consider pre-splitting your data to save memory.
Calculating loss...: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.77s/it]
Iter 5: Val loss 5.226, Val took 1.789s
Iter 5: Train loss 4.596, Learning Rate 1.000e-05, It/sec 0.486, Tokens/sec 61.702, Trained Tokens 635, Peak mem 4.670 GB
Saved final weights to /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/test_dream/adapters.safetensors.
Yes, I started to implement the diffusion generation, that was just to test the model implementation, or should it be better to wait until Llada has been merged to continue?
Yes, I started to implement the diffusion generation, that was just to test the model implementation, or should it be better to wait until Llada has been merged to continue?
There's no need to wait for that. My recommendation though for diffusion models is to write a new generate_step (and possibly stream_generate / generate. They are so different that I think we should have a separate path entirely to avoid cluttering the code and to keep them easy to change as the diffusion model APIs kind of converge. The models can of course still go in models/.
Not at the moment. Diffusion-based text-to-text is generally slower on Apple Silicon compared to token-wise autoregression, and I haven’t seen adoption of this model on other platforms yet. A separate port might be more suitable in my opinion. That said, if there’s genuine interest, I’m happy to continue and prioritize it. What do you think, @angeloskath@awni?
I agree so far there isn't a ton of interest for that specific model. I think it's fine to deprioritize this for now. I do think we should keep an eye on diffusion LLMs in general. As they improve it may make more sense to support them (either here or elsewhere), but we haven't reached that point yet and I haven't seen a ton of progress recently.
I agree so far there isn't a ton of interest for that specific model. I think it's fine to deprioritize this for now. I do think we should keep an eye on diffusion LLMs in general. As they improve it may make more sense to support them (either here or elsewhere), but we haven't reached that point yet and I haven't seen a ton of progress recently.
Agreed! I’ll leave the PR open but move the implementation into here, so it’s easier to maintain and revisit once they gain more traction. That way, we don’t lose the work, and we’re ready if adoption picks up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.