TRIG: Trade-offs in Image Generation

Trade-offs and Relationships in Image Generation: How Do Different Evaluation Dimensions Interact? (ICCV 2025)

For the new multilingual benchmark, please check the TRIG-Multilingual Folder.

TODO

Release the TRIG dataset and evaluation pipeline.
Release the Finetune pipeline and experiments.
Release the Multilingual Evaluation Benchmark.

Quick Start

TRIG Benchmark

Load from 🤗 Huggingface Link.

Note

Legacy JSON is still supported for local experiments. The JSON files are kept in the Hugging Face dataset under raw/; use the --data_file /path/to/file.json argument in generation scripts when you need to bypass the parquet dataset.

from datasets import load_dataset

ds_t2i = load_dataset("RISys-Lab/TRIG", split="text_to_image")
ds_p2p = load_dataset("RISys-Lab/TRIG", split="image_editing")
ds_s2p = load_dataset("RISys-Lab/TRIG", split="subject_driven")

sample = ds_t2i[0] # keys: (data_id, item, prompt, dimension_prompt, parent dataset, img_id, dimensions, image)
# Generation and Evaluation

TRIG-Multilingual Benchmark

Load from 🤗 Huggingface Link.

from datasets import load_dataset

ds_cg = load_dataset("RISys-Lab/TRIG-Multilingual", split="content_generation")
ds_tr = load_dataset("RISys-Lab/TRIG-Multilingual", split="text_rendering")

sample_cg = ds_cg[0]
sample_tr = ds_tr[0]

print(sample_cg["prompt"])
print(sample_cg["dimension"], sample_cg["lang"])

print(sample_tr["prompt"])
print(sample_tr["render_text"])
print(sample_tr["condition_image"])  # PIL.Image.Image for text placement

Generation currently follows two paths:

content_generation uses the standard TRIG text-to-image generation logic. For the multilingual FLUX adapter, see trig_multilingual/generation/pea.py.
text_rendering uses the scripts in trig_multilingual/generation/. These read render_text, render_layout, and the embedded condition_image from parquet. Legacy JSON input is still available through --data_file, but parquet is the default.

Evaluation also loads the Hugging Face parquet splits by default. Content-generation scoring uses trig/metrics/metaclip2_score.py on the content_generation split, and multilingual text-rendering OCR evaluation uses trig_multilingual/evaluation/trig_ml_ocr.py on the text_rendering split. Legacy JSON files remain available in the dataset raw/ folder for fallback use.

Setup

Installation

conda create -n trig python=3.10 -y
conda activate trig
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

We recommand to use TRIG score by vllm. Please install with

# for Qwen2.5vl, please update your transformers
pip install transformers -U
pip install accelerate
pip install 'vllm>=0.7.2'

Then deploy the selected VLM models, currently the TRIG score support GPT series, Qwen2.5-VL series, and LLaVA-NeXT Series. For more information, please visit vllm document.

# use Qwen2.5-VL-7B
vllm serve Qwen/Qwen2.5-VL-7B-Instruct --port 8000 --device cuda --host 0.0.0.0 --dtype bfloat16 --limit-mm-per-prompt image=5,video=5

# or use Qwen2.5-VL-72B with quantize version
vllm serve Qwen/Qwen2-VL-72B-Instruct-AWQ --dtype float16 --port 8000 --gpu-memory-utilization 0.85 --tensor-parallel-size 2 --quantization awq --limit_mm_per_prompt image=4

Getting Started

Auto Evaluation pipeline on TRIG Benchmark

First Please set up a yaml file in config folder to run an experiment, as the format below:

name: "test" # name for this experiment
task: "t2i" # chosen from t2i/p2p/s2p, support one task at a time
# load prompts from the Hugging Face dataset
dataset_name: "RISys-Lab/TRIG"

generation:
    # selected models
    models: ["flux",]
    
evaluation:
    image_dir: ["data/output/demo",]
    result_dir: "data/result"

dimensions:
    IQ-O:
        metrics: ["GPTLogitMetric"]
    TA-R:
        metrics: ["GPTLogitMetric"]
    TA-S:
        metrics: ["GPTLogitMetric", "AnotherMetric"]
    Other Dimensions You Want:
        metrics: ["OtherMetric"]

relation:
  models: ["flux"]
  res: "formatted_flux"
  metric: "spearman_corr"
  plot: true
  heatmap: true
  tsne: true
  tradeoff: true
  quadrant_analysis: true
  thresholds:
    synergy: 0.8 
    bottleneck: 0.5 
    
  insight_thresholds:
    synergy_density: 0.4
    bottleneck_density: 0.4
    dominance_ratio: 0.8
    tradeoff_corr: 0.6

More examples could be found in the config folder.

The evaluator maps task to the corresponding Hugging Face split automatically:

t2i -> text_to_image
p2p -> image_editing
s2p -> subject_driven

For local legacy JSON files, you can still use prompt_path instead of dataset_name.

Run main.py

python main.py --config your_config.yaml

Outputs:

Generated images will be saved to data/output/your_task/your_model/
Evaluation result will be saved to data/output/your_task/your_model.json
Relation result will be saved to data/output/your_model/

Manual Evaluation by metrics toolkit

All the metrics could be used independently. For example:

metric_class = trig.metrics.import_metric("aesthetic_predictor")
metric_instance = metric_class()
# Single Evaluation
score = metric_instance.compute(image_path="/path/to/image", prompt="prompt")
# Batch Evaluation
score = metric_instance.compute_batch_manual(images=["/path/to/image"], prompts=["prompt"])

Finetuning by DTM

Select the dimension and trade-off type you want to optimize. for example, in the paper, we choose Knowledge & Ambiguity, and try to balance these two dimensions.
Follow the TRIG principle, we create a original set which covers the two dim.
We generate images with this set, the ouput images are in flux_ft_train.zip.
Test these images, select good samples with trade-off as expected.
Use these selected image to do LoRA finetune on flux.
Then we got the balanced flux model.

Prompt Engineering by DTM

use model name 'sd35_dtm_dim', 'sana_dtm_dim', 'xflux_dtm_dim' and 'hqedit_dtm_dim' in the yaml config file to generate with Prompt Engineering.

Acknowledgement

Many thanks to the great works in GenAI Models like FLUX, Benchmarks like HEIM, Metric like VQAScore.

Citation

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

@inproceedings{zhang2025trade,
  title={Trade-offs in image generation: How do different dimensions interact?},
  author={Zhang, Sicheng and Xie, Binzhu and Yan, Zhonghao and Zhang, Yuli and Zhou, Donghao and Chen, Xiaofei and Qiu, Shi and Liu, Jiaqi and Xie, Guoyang and Lu, Zhichao},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={17256--17267},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRIG: Trade-offs in Image Generation

TODO

Quick Start

TRIG Benchmark

TRIG-Multilingual Benchmark

Setup

Installation

Getting Started

Auto Evaluation pipeline on TRIG Benchmark

Manual Evaluation by metrics toolkit

Finetuning by DTM

Prompt Engineering by DTM

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
config		config
data/relation		data/relation
dataset		dataset
deprecated		deprecated
doc		doc
scripts		scripts
trig		trig
trig_multilingual		trig_multilingual
.gitignore		.gitignore
README.md		README.md
demo.jpg		demo.jpg
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TRIG: Trade-offs in Image Generation

TODO

Quick Start

TRIG Benchmark

TRIG-Multilingual Benchmark

Setup

Installation

Getting Started

Auto Evaluation pipeline on TRIG Benchmark

Manual Evaluation by metrics toolkit

Finetuning by DTM

Prompt Engineering by DTM

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages