Skip to content

Commit e668551

Browse files
[docs] Update UNITER project doc
Update doc to include VILLA citation and feature discrepancy explaination. ghstack-source-id: f696a85 Pull Request resolved: #1176
1 parent 3443b70 commit e668551

File tree

1 file changed

+22
-1
lines changed

1 file changed

+22
-1
lines changed

website/docs/projects/uniter.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,33 @@ Computer Vision, 2020b. ([arXiV](https://arxiv.org/pdf/1909.11740))
1818
}
1919
```
2020

21+
This repository contains the checkpoint for the pytorch implementation of the VILLA model, released originally under this ([repo](https://github.com/zhegan27/VILLA)). Please cite the following papers if you are using VILLA model from mmf:
22+
23+
* Gan, Z., Chen, Y. C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). *Large-scale adversarial training for vision-and-language representation learning.* arXiv preprint arXiv:2006.06195. ([arXiV](https://arxiv.org/abs/2006.06195))
24+
```
25+
@inproceedings{gan2020large,
26+
title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
27+
author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
28+
booktitle={NeurIPS},
29+
year={2020}
30+
}
31+
```
32+
2133
## Installation
2234

2335
Follow installation instructions in the [documentation](https://mmf.readthedocs.io/en/latest/notes/installation.html).
2436

2537
## Training
2638

39+
UNITER uses image region features extracted by [BUTD](https://github.com/peteanderson80/bottom-up-attention).
40+
These are different features than those extracted in MMF and used by default in our datasets.
41+
Support for BUTD feature extraction through pytorch in MMF is in the works.
42+
However this means that the UNITER and VILLA checkpoints which are pretrained on BUTD features,
43+
do not work out of the box on image region features in MMF.
44+
You can still finetune these checkpoints in MMF on the Faster RCNN features used in MMF datasets for comparable performance.
45+
This is what is done by default.
46+
Or you can download BUTD features for the dataset you're working with and change the dataset in MMF to use these.
47+
2748
To train a fresh UNITER model on the VQA2.0 dataset, run the following command
2849
```
2950
mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter
@@ -33,6 +54,7 @@ To finetune a pretrained UNITER model on the VQA2.0 dataset,
3354
```
3455
mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter checkpoint.resume_zoo=uniter.pretrained
3556
```
57+
The finetuning configs for VQA2 are from the UNITER base 4-gpu [configs](https://github.com/ChenRocks/UNITER/blob/master/config/train-vqa-base-4gpu.json). For an example finetuning config with smaller batch size consider using the ViLT VQA2 training configs, however this may yield slightly lower performance.
3658

3759
To finetune a pretrained [VILLA](https://arxiv.org/pdf/2006.06195.pdf) model on the VQA2.0 dataset,
3860
```
@@ -44,5 +66,4 @@ To pretrain UNITER on the masked COCO dataset, run the following command
4466
mmf_run config=projects/uniter/configs/masked_coco/defaults.yaml run_type=train_val dataset=masked_coco model=uniter
4567
```
4668

47-
4869
Based on the config used and `do_pretraining` defined in the config, the model can use the pretraining recipe described in the UNITER paper, or be finetuned on downstream tasks.

0 commit comments

Comments
 (0)