Skip to content

Hadar01/Prompt-Optimizer-v2

Repository files navigation

Prompt Optimizer

Fine-tune a local LLM to transform sloppy, verbose user prompts into concise, token-compressed, structured prompts that produce better outputs across any domain.

Folder Structure

prompt-optimizer/
├── configs/
│   └── default.yaml          # Central configuration
├── src/
│   ├── config.py              # Dataclass-based config + YAML loader
│   ├── utils/                 # Logging setup
│   ├── dataset/
│   │   ├── seeds.py           # High-quality seed examples
│   │   ├── generator.py       # Synthetic dataset pipeline
│   │   └── formatter.py       # Chat-template formatting
│   ├── training/
│   │   └── train.py           # LoRA / QLoRA fine-tuning
│   ├── inference/
│   │   └── engine.py          # Model loading + generation
│   └── evaluation/
│       └── metrics.py         # Token counting + quality heuristics
├── app/
│   └── ui.py                  # Gradio UI
├── scripts/
│   ├── generate_dataset.py    # Dataset generation entrypoint
│   ├── train.py               # Training entrypoint
│   ├── infer.py               # CLI inference entrypoint
│   ├── evaluate.py            # Evaluation entrypoint
│   └── launch_ui.py           # UI launch entrypoint
├── data/                      # Generated datasets
├── outputs/                   # Checkpoints, adapters
├── requirements.txt
├── .gitignore
└── README.md

Quick Start

1. Install dependencies

pip install -r requirements.txt

Colab: The same command works in a Colab cell. Add ! prefix :

!pip install -r requirements.txt

2. Generate the dataset

python scripts/generate_dataset.py

This creates data/processed/train.jsonl and data/processed/val.jsonl using built-in seed examples plus synthetic augmentations.

To include your own data:

python scripts/generate_dataset.py --external data/raw/my_data.jsonl

3. Train the adapter

python scripts/train.py

Training uses QLoRA with 4-bit quantisation by default. The LoRA adapter is saved to outputs/adapter/.

To resume from a checkpoint:

python scripts/train.py --resume outputs/checkpoints/checkpoint-100

4. Run inference

Single prompt:

python scripts/infer.py "Hey can you please help me write a Python function that ..."

Interactive mode:

python scripts/infer.py --interactive

Test samples (no prompt needed):

python scripts/infer.py

5. Evaluate

python scripts/evaluate.py

Prints a table with token counts, compression ratios, and quality heuristics.

6. Launch the UI

python scripts/launch_ui.py

Opens a Gradio app at http://localhost:7860. To create a public link (useful on Colab):

python scripts/launch_ui.py --share

Configuration

All settings live in configs/default.yaml. Key sections:

Section Controls
model Base model name (swap between Mistral / Llama)
quantization 4-bit BitsAndBytes settings
lora Rank, alpha, dropout, target modules
training Epochs, batch size, LR, scheduler, etc.
generation max_new_tokens, temperature, top_p, etc.
dataset Paths, val split ratio, sample limits
evaluation Token threshold, min compression ratio
ui Server host, port, share flag

Model Selection & VRAM

Model Min VRAM (4-bit) Notes
mistralai/Mistral-7B-Instruct-v0.2 ~6 GB Good default, fast
meta-llama/Meta-Llama-3-8B-Instruct ~7 GB Strong instruction following
TinyLlama/TinyLlama-1.1B-Chat-v1.0 ~2 GB Ultra-light for testing

Change the model in configs/default.yaml under model.name.

Colab Notes

  • Use a T4 GPU runtime (free tier).
  • Clone the repo and pip install -r requirements.txt.
  • Run scripts/train.py — the defaults are tuned for low VRAM.
  • Use --share with the UI to get a public URL.
  • If VRAM is tight, reduce training.max_seq_length or training.per_device_train_batch_size in the config.

Chat Template Compatibility

The dataset formatter uses each model's native apply_chat_template() method, ensuring training data is correctly formatted for Llama-3-Instruct, Mistral-Instruct, and similar chat models. A plain-text fallback is included for tokenizers without a built-in chat template.

Output Token Limits

Generation is capped via generation.max_new_tokens (default 512) to prevent cut-off mid-output. Adjust this in the config if your use case requires longer compressed prompts.

About

A fine-tuned language model designed to automatically rewrite and optimize user prompts for better, more accurate AI generations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages