Skip to content

wjnwjn59/CLAugViVQA

Repository files navigation

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

Python Version Conference

Description

This is the official repository for our paper, Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations, accepted at the AAAI-25 Workshop on Document Understanding and Intelligence. This repository contains the codebase for improving Vietnamese Visual Question Answering (VQA) using curriculum learning on both raw and augmented textual data.

Prerequisites

We assume you have Anaconda installed for managing the virtual environment. If not, you can download it from here.

To set up the environment:

  1. Create and activate a new Anaconda environment:

    conda create -n vqa_env python=3.11
    conda activate vqa_env
  2. Install the required packages:

    pip install -r requirements.txt

Data Setup

We use two datasets for training: ViVQA and OpenViVQA. You can download them from:

After downloading, organize your datasets as follows:

datasets/
│── OpenViVQA/
│── ViVQA/

Once the datasets are structured correctly, update the data_dir parameter in config.py to match your local path.

Generating Paraphrases

Before training, paraphrased datasets need to be generated. Run the following command:

python3 generate_new_dataset.py \
    --train_filepath path/to/your/dataset.csv \
    --num_params 20 \
    --random_seed 59 \
    --save_filepath paraphrases.csv \
    --paraphrase_method mt5

Explanation of Arguments:

  • --train_filepath: Path to the input training dataset.
  • --num_params: Number of paraphrases to generate per sample.
  • --random_seed: Seed for reproducibility.
  • --save_filepath: Output filename for storing paraphrases.
  • --paraphrase_method: Paraphrase generation method (mt5 or gpt).

Training

Once the dataset is prepared, start training by running:

bash start_training.sh

Explanation of Key Parameters:

  • --epochs: Number of training epochs.
  • --patience: Number of epochs without improvement before early stopping.
  • --n_text_paras: Number of text paraphrases used for augmentation.
  • --text_para_thresh: Threshold for selecting paraphrases.
  • --is_text_augment: Enable/disable text augmentation.
  • --use_dynamic_thresh: Enable/disable dynamic thresholding.
  • --start_threshold: Initial value for dynamic thresholding.
  • --min_threshold: Minimum value for dynamic thresholding.
  • --is_log_result: Enable logging of training results.

For additional arguments, refer to train.py or run:

python3 train.py --help

Citation

If you find our work or this repository useful, please cite our paper:

@misc{nguyen2025enhancingvietnamesevqacurriculum,
      title={Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations},
      author={Khoi Anh Nguyen and Linh Yen Vu and Thang Dinh Duong and Thuan Nguyen Duong and Huy Thanh Nguyen and Vinh Quang Dinh},
      year={2025},
      eprint={2503.03285},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.03285},
}

About

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations (AAAIW 2025)

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors