diff --git a/website/docs/projects/uniter.md b/website/docs/projects/uniter.md index 58b3aeeeb..19fed893b 100644 --- a/website/docs/projects/uniter.md +++ b/website/docs/projects/uniter.md @@ -18,12 +18,33 @@ Computer Vision, 2020b. ([arXiV](https://arxiv.org/pdf/1909.11740)) } ``` +This repository contains the checkpoint for the pytorch implementation of the VILLA model, released originally under this ([repo](https://github.com/zhegan27/VILLA)). Please cite the following papers if you are using VILLA model from mmf: + +* Gan, Z., Chen, Y. C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). *Large-scale adversarial training for vision-and-language representation learning.* arXiv preprint arXiv:2006.06195. ([arXiV](https://arxiv.org/abs/2006.06195)) +``` +@inproceedings{gan2020large, + title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning}, + author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing}, + booktitle={NeurIPS}, + year={2020} +} +``` + ## Installation Follow installation instructions in the [documentation](https://mmf.readthedocs.io/en/latest/notes/installation.html). ## Training +UNITER uses image region features extracted by [BUTD](https://github.com/peteanderson80/bottom-up-attention). +These are different features than those extracted in MMF and used by default in our datasets. +Support for BUTD feature extraction through pytorch in MMF is in the works. +However this means that the UNITER and VILLA checkpoints which are pretrained on BUTD features, +do not work out of the box on image region features in MMF. +You can still finetune these checkpoints in MMF on the Faster RCNN features used in MMF datasets for comparable performance. +This is what is done by default. +Or you can download BUTD features for the dataset you're working with and change the dataset in MMF to use these. + To train a fresh UNITER model on the VQA2.0 dataset, run the following command ``` mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter @@ -33,6 +54,7 @@ To finetune a pretrained UNITER model on the VQA2.0 dataset, ``` mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter checkpoint.resume_zoo=uniter.pretrained ``` +The finetuning configs for VQA2 are from the UNITER base 4-gpu [configs](https://github.com/ChenRocks/UNITER/blob/master/config/train-vqa-base-4gpu.json). For an example finetuning config with smaller batch size consider using the ViLT VQA2 training configs, however this may yield slightly lower performance. To finetune a pretrained [VILLA](https://arxiv.org/pdf/2006.06195.pdf) model on the VQA2.0 dataset, ``` @@ -44,5 +66,4 @@ To pretrain UNITER on the masked COCO dataset, run the following command mmf_run config=projects/uniter/configs/masked_coco/defaults.yaml run_type=train_val dataset=masked_coco model=uniter ``` - Based on the config used and `do_pretraining` defined in the config, the model can use the pretraining recipe described in the UNITER paper, or be finetuned on downstream tasks.