You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains the checkpoint for the pytorch implementation of the VILLA model, released originally under this ([repo](https://github.com/zhegan27/VILLA)). Please cite the following papers if you are using VILLA model from mmf:
22
+
23
+
* Gan, Z., Chen, Y. C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). *Large-scale adversarial training for vision-and-language representation learning.* arXiv preprint arXiv:2006.06195. ([arXiV](https://arxiv.org/abs/2006.06195))
24
+
```
25
+
@inproceedings{gan2020large,
26
+
title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
27
+
author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
28
+
booktitle={NeurIPS},
29
+
year={2020}
30
+
}
31
+
```
32
+
21
33
## Installation
22
34
23
35
Follow installation instructions in the [documentation](https://mmf.readthedocs.io/en/latest/notes/installation.html).
24
36
25
37
## Training
26
38
39
+
UNITER uses image region features extracted by [BUTD](https://github.com/peteanderson80/bottom-up-attention).
40
+
These are different features than those extracted in MMF and used by default in our datasets.
41
+
Support for BUTD feature extraction through pytorch in MMF is in the works.
42
+
However this means that the UNITER and VILLA checkpoints which are pretrained on BUTD features,
43
+
do not work out of the box on image region features in MMF.
44
+
You can still finetune these checkpoints in MMF on the Faster RCNN features used in MMF datasets for comparable performance.
45
+
This is what is done by default.
46
+
Or you can download BUTD features for the dataset you're working with and change the dataset in MMF to use these.
47
+
27
48
To train a fresh UNITER model on the VQA2.0 dataset, run the following command
The finetuning configs for VQA2 are from the UNITER base 4-gpu [configs](https://github.com/ChenRocks/UNITER/blob/master/config/train-vqa-base-4gpu.json). For an example finetuning config with smaller batch size consider using the ViLT VQA2 training configs, however this may yield slightly lower performance.
36
58
37
59
To finetune a pretrained [VILLA](https://arxiv.org/pdf/2006.06195.pdf) model on the VQA2.0 dataset,
38
60
```
@@ -44,5 +66,4 @@ To pretrain UNITER on the masked COCO dataset, run the following command
Based on the config used and `do_pretraining` defined in the config, the model can use the pretraining recipe described in the UNITER paper, or be finetuned on downstream tasks.
0 commit comments