Prompt Optimizer

Fine-tune a local LLM to transform sloppy, verbose user prompts into concise, token-compressed, structured prompts that produce better outputs across any domain.

Folder Structure

prompt-optimizer/
├── configs/
│   └── default.yaml          # Central configuration
├── src/
│   ├── config.py              # Dataclass-based config + YAML loader
│   ├── utils/                 # Logging setup
│   ├── dataset/
│   │   ├── seeds.py           # High-quality seed examples
│   │   ├── generator.py       # Synthetic dataset pipeline
│   │   └── formatter.py       # Chat-template formatting
│   ├── training/
│   │   └── train.py           # LoRA / QLoRA fine-tuning
│   ├── inference/
│   │   └── engine.py          # Model loading + generation
│   └── evaluation/
│       └── metrics.py         # Token counting + quality heuristics
├── app/
│   └── ui.py                  # Gradio UI
├── scripts/
│   ├── generate_dataset.py    # Dataset generation entrypoint
│   ├── train.py               # Training entrypoint
│   ├── infer.py               # CLI inference entrypoint
│   ├── evaluate.py            # Evaluation entrypoint
│   └── launch_ui.py           # UI launch entrypoint
├── data/                      # Generated datasets
├── outputs/                   # Checkpoints, adapters
├── requirements.txt
├── .gitignore
└── README.md

Quick Start

1. Install dependencies

pip install -r requirements.txt

Colab: The same command works in a Colab cell. Add ! prefix :
!pip install -r requirements.txt

2. Generate the dataset

python scripts/generate_dataset.py

This creates data/processed/train.jsonl and data/processed/val.jsonl using built-in seed examples plus synthetic augmentations.

To include your own data:

python scripts/generate_dataset.py --external data/raw/my_data.jsonl

3. Train the adapter

python scripts/train.py

Training uses QLoRA with 4-bit quantisation by default. The LoRA adapter is saved to outputs/adapter/.

To resume from a checkpoint:

python scripts/train.py --resume outputs/checkpoints/checkpoint-100

4. Run inference

Single prompt:

python scripts/infer.py "Hey can you please help me write a Python function that ..."

Interactive mode:

python scripts/infer.py --interactive

Test samples (no prompt needed):

python scripts/infer.py

5. Evaluate

python scripts/evaluate.py

Prints a table with token counts, compression ratios, and quality heuristics.

6. Launch the UI

python scripts/launch_ui.py

Opens a Gradio app at http://localhost:7860. To create a public link (useful on Colab):

python scripts/launch_ui.py --share

Configuration

All settings live in configs/default.yaml. Key sections:

Section	Controls
`model`	Base model name (swap between Mistral / Llama)
`quantization`	4-bit BitsAndBytes settings
`lora`	Rank, alpha, dropout, target modules
`training`	Epochs, batch size, LR, scheduler, etc.
`generation`	max_new_tokens, temperature, top_p, etc.
`dataset`	Paths, val split ratio, sample limits
`evaluation`	Token threshold, min compression ratio
`ui`	Server host, port, share flag

Model Selection & VRAM

Model	Min VRAM (4-bit)	Notes
`mistralai/Mistral-7B-Instruct-v0.2`	~6 GB	Good default, fast
`meta-llama/Meta-Llama-3-8B-Instruct`	~7 GB	Strong instruction following
`TinyLlama/TinyLlama-1.1B-Chat-v1.0`	~2 GB	Ultra-light for testing

Change the model in configs/default.yaml under model.name.

Colab Notes

Use a T4 GPU runtime (free tier).
Clone the repo and pip install -r requirements.txt.
Run scripts/train.py — the defaults are tuned for low VRAM.
Use --share with the UI to get a public URL.
If VRAM is tight, reduce training.max_seq_length or training.per_device_train_batch_size in the config.

Chat Template Compatibility

The dataset formatter uses each model's native apply_chat_template() method, ensuring training data is correctly formatted for Llama-3-Instruct, Mistral-Instruct, and similar chat models. A plain-text fallback is included for tokenizers without a built-in chat template.

Output Token Limits

Generation is capped via generation.max_new_tokens (default 512) to prevent cut-off mid-output. Adjust this in the config if your use case requires longer compressed prompts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Optimizer

Folder Structure

Quick Start

1. Install dependencies

2. Generate the dataset

3. Train the adapter

4. Run inference

5. Evaluate

6. Launch the UI

Configuration

Model Selection & VRAM

Colab Notes

Chat Template Compatibility

Output Token Limits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
app		app
configs		configs
data		data
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
GUIDE.md		GUIDE.md
PAPER_OUTLINE.md		PAPER_OUTLINE.md
README.md		README.md
chat.json		chat.json
check_seeds.py		check_seeds.py
requirements.txt		requirements.txt
walkthrough.md		walkthrough.md

Folders and files

Latest commit

History

Repository files navigation

Prompt Optimizer

Folder Structure

Quick Start

1. Install dependencies

2. Generate the dataset

3. Train the adapter

4. Run inference

5. Evaluate

6. Launch the UI

Configuration

Model Selection & VRAM

Colab Notes

Chat Template Compatibility

Output Token Limits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages