Sparse-LaViDa/scripts/train/readme.MD at main · adobe-research/Sparse-LaViDa

Data Preparation

The training data are configured with a yaml file in the config folder. Each data is defined has the following key fields:

json_path: the path to actual data record files. Note that we typically use csv or parquet for efficiency. The field name is kept as json_path for compatibility reasons

sampling_strategy: How dataset is sampled. By default, we use full, which means all data are used. Other options are dup:2 (duplicate data 2x), random:200000, random sample 200k, etc.

preprocess_fn: This is the most important element in the dataset configuration. It takes columns of csv/parquet file and convert them to LLaVa-Style conversation used in training. They are defined in llava/train/data/process_functions.py

An Example

On our infrastructure, we host all images on AWS s3, and use parquet files to document them. A typical data pipeline for text to image generation would be

prepare a csv with columns: s3_path, caption,fltLaionAesthScore (score used for filtering),intHeight,intWidth
add the following entry to yaml

  - name: our-dataset
    json_path: /path/to/parquet/data.parquet
    sampling_strategy: all
    preprocess_fn: preproces_text_to_image_generation_s3
    columns: ['s3_path', 'caption', 'fltLaionAesthScore','intHeight','intWidth']
    aes_cutoff: 5.67
    min_size: 512

During the training, preproces_text_to_image_generation_s3 is called to convert the data to the following LLaVa format

    payload = {
        "id": "000951660",
        "image_gen": img_path,
        "conversations": [
        {
            "from": "human",
            "value": f"Generate an image with the caption: {caption}"
        },
        {
            "from": "gpt",
            "value": "Sure <image_gen>"
        }
        ]
    }

List of datasets used

We provide links to all the datasets that are used. We are unable to provide the exact parquet files because the they include s3 urls on our private bucket. Please process each data following the above guidelines. Feel free to email us if you encounter any questions. We note that LaViDa-O is trained entirely on public data.

Text-to-Image Generation, Image Editing

Understanding and Grounding

MAmmoTH-VL

GranD

VisualWebInstruct

Training

The following scripts are used to train the model

scripts/train/s1-gnd.sh
scripts/train/s2-256.sh
scripts/train/s2-1024.sh
scripts/train/s3-unified.sh

Before launch training scripte, please make sure to

Add your huggingface token in environment variable
Change the batch size, number of gpus, number of nodes,max steps etc according to your infrastructure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Preparation

An Example

List of datasets used

Text-to-Image Generation, Image Editing

Understanding and Grounding

Training

FilesExpand file tree

readme.MD

Latest commit

History

readme.MD

File metadata and controls

Data Preparation

An Example

List of datasets used

Text-to-Image Generation, Image Editing

Understanding and Grounding

Training