This repository was archived by the owner on Sep 10, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 812
Experimental machine translation example #864
Open
akurniawan
wants to merge
35
commits into
pytorch:main
Choose a base branch
from
akurniawan:translation-example
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 31 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
3ab5c1e
Merge pull request #1 from pytorch/master
akurniawan db1557f
Merge branch 'master' of https://github.com/pytorch/text
d2bac2b
Merge branch 'master' of https://github.com/pytorch/text
0a39944
Merge branch 'master' of https://github.com/pytorch/text
9259228
Merge branch 'master' of https://github.com/pytorch/text
akurniawan 4791ccf
first commit for machine translation example
akurniawan 9dbb558
adding word version for target
akurniawan 8ba7975
add word vocab to the training dataset
akurniawan 780d837
wrapping up training and evaluation code
akurniawan 6af29b1
add README
akurniawan 55523e2
add bleu score
akurniawan 24b4534
add device to inputs
akurniawan d35b12f
run full training
akurniawan 5fa01e5
add tqdm for training and evaluation bar visualization
akurniawan 90e3200
add seed to ensure reproducibility
akurniawan 1d5a0ce
add remove extra whitespace preprocessing
akurniawan 1e6cb1f
add param on testing data
akurniawan ea94065
fix printing format in test
akurniawan e2a2386
add result for the machine translation example
akurniawan 7e89051
add argparse and move collate fn
akurniawan 17da84b
rename train.py to train_char to differentiate between character leve…
akurniawan fd81ecb
add train_word for word level training in machine translation
akurniawan 78ef553
add more complete todo message
akurniawan 66428da
add case to handle whitespaces
akurniawan 88d6332
fix wrong calculation by removing first index
akurniawan 1e14007
fix wrong learning rate
akurniawan b72f530
add saving functionality
akurniawan b4f9851
Merge branch 'master' of https://github.com/pytorch/text into transla…
akurniawan 55eec60
change wrong index in testing data
akurniawan 27dbf43
add complete experiments output for both char and word version
akurniawan 2724231
Merge branch 'master' of https://github.com/pytorch/text into transla…
akurniawan 0aa87f4
add more explanations on char vs word example
akurniawan ae52c0d
Merge branch 'master' of https://github.com/pytorch/text into transla…
akurniawan 5eba34e
char_transform with partial and map
akurniawan 0886a68
remove unused imports
akurniawan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| # This is an example to create a machine translation dataset and train a translation model. | ||
|
|
||
| Users will use the training data in the raw file from Multi30k dataset to train a machine translation model with the character composition method. | ||
|
|
||
| To try the example, simply run the following commands: | ||
|
|
||
| ```bash | ||
| python train_char.py | ||
| ``` | ||
|
|
||
| For character level training, and | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the difference between "character" level vs "word" level training? Better to be more clear with more doc here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added more explanation |
||
|
|
||
| ```bash | ||
| python train_word.py | ||
| ``` | ||
|
|
||
| For word level training | ||
|
|
||
| ## Experiment Result | ||
|
|
||
| The following is the output example for running `train_char.py` | ||
|
|
||
| ``` | ||
| The model has 5,617,503 trainable parameters | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [01:54<00:00, 1.98it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.37it/s] | ||
| Epoch: 01 | Time: 1m 57s | ||
| Train Loss: 5.277 | Train PPL: 195.798 | Train BLEU: 0.001 | ||
| Val. Loss: 4.088 | Val. PPL: 59.598 | Val. BLEU: 0.006 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:12<00:00, 1.72it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.87it/s] | ||
| Epoch: 02 | Time: 2m 16s | ||
| Train Loss: 3.711 | Train PPL: 40.877 | Train BLEU: 0.022 | ||
| Val. Loss: 2.964 | Val. PPL: 19.369 | Val. BLEU: 0.048 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:14<00:00, 1.69it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.89it/s] | ||
| Epoch: 03 | Time: 2m 18s | ||
| Train Loss: 2.901 | Train PPL: 18.189 | Train BLEU: 0.055 | ||
| Val. Loss: 2.172 | Val. PPL: 8.774 | Val. BLEU: 0.111 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:18<00:00, 1.64it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.00it/s] | ||
| Epoch: 04 | Time: 2m 21s | ||
| Train Loss: 2.391 | Train PPL: 10.927 | Train BLEU: 0.092 | ||
| Val. Loss: 1.766 | Val. PPL: 5.849 | Val. BLEU: 0.164 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:19<00:00, 1.63it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.98it/s] | ||
| Epoch: 05 | Time: 2m 23s | ||
| Train Loss: 2.085 | Train PPL: 8.042 | Train BLEU: 0.118 | ||
| Val. Loss: 1.503 | Val. PPL: 4.494 | Val. BLEU: 0.196 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:20<00:00, 1.61it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.99it/s] | ||
| Epoch: 06 | Time: 2m 24s | ||
| Train Loss: 1.856 | Train PPL: 6.398 | Train BLEU: 0.140 | ||
| Val. Loss: 1.302 | Val. PPL: 3.678 | Val. BLEU: 0.229 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:21<00:00, 1.60it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.02it/s] | ||
| Epoch: 07 | Time: 2m 25s | ||
| Train Loss: 1.683 | Train PPL: 5.383 | Train BLEU: 0.157 | ||
| Val. Loss: 1.164 | Val. PPL: 3.202 | Val. BLEU: 0.250 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:23<00:00, 1.59it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.03it/s] | ||
| Epoch: 08 | Time: 2m 26s | ||
| Train Loss: 1.554 | Train PPL: 4.730 | Train BLEU: 0.168 | ||
| Val. Loss: 1.075 | Val. PPL: 2.930 | Val. BLEU: 0.263 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:24<00:00, 1.57it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.02it/s] | ||
| Epoch: 09 | Time: 2m 28s | ||
| Train Loss: 1.455 | Train PPL: 4.283 | Train BLEU: 0.178 | ||
| Val. Loss: 1.016 | Val. PPL: 2.763 | Val. BLEU: 0.271 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:25<00:00, 1.56it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.03it/s] | ||
| Epoch: 10 | Time: 2m 29s | ||
| Train Loss: 1.373 | Train PPL: 3.948 | Train BLEU: 0.187 | ||
| Val. Loss: 0.972 | Val. PPL: 2.644 | Val. BLEU: 0.280 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.95it/s] | ||
| | Test Loss: 1.011 | Test PPL: 2.748 | Test BLEU: 0.273 | ||
| Saving model to char_mt_seq2seq.pt | ||
| Save vocab to torchtext_char_mt_vocab.pt | ||
| ``` | ||
|
|
||
| And the following is the output of `train_word.py` | ||
|
|
||
| ``` | ||
| The model has 14,601,140 trainable parameters | ||
| 0%| | 0/227 [00:00<?, ?it/s]/home/akurniawan/text/examples/machine_translation/utils.py:38: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). | ||
| txt = list(map(torch.tensor, input)) | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:14<00:00, 1.69it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.61it/s] | ||
| Epoch: 01 | Time: 2m 19s | ||
| Train Loss: 3.796 | Train PPL: 44.519 | Train BLEU: 0.139 | ||
| Val. Loss: 1.480 | Val. PPL: 4.391 | Val. BLEU: 0.315 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:34<00:00, 1.46it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.56it/s] | ||
| Epoch: 02 | Time: 2m 40s | ||
| Train Loss: 1.068 | Train PPL: 2.909 | Train BLEU: 0.346 | ||
| Val. Loss: 0.748 | Val. PPL: 2.113 | Val. BLEU: 0.395 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:36<00:00, 1.45it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.60it/s] | ||
| Epoch: 03 | Time: 2m 41s | ||
| Train Loss: 0.604 | Train PPL: 1.830 | Train BLEU: 0.398 | ||
| Val. Loss: 0.476 | Val. PPL: 1.610 | Val. BLEU: 0.415 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:45<00:00, 1.37it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.62it/s] | ||
| Epoch: 04 | Time: 2m 50s | ||
| Train Loss: 0.390 | Train PPL: 1.477 | Train BLEU: 0.413 | ||
| Val. Loss: 0.348 | Val. PPL: 1.416 | Val. BLEU: 0.423 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:57<00:00, 1.28it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.53it/s] | ||
| Epoch: 05 | Time: 3m 2s | ||
| Train Loss: 0.275 | Train PPL: 1.316 | Train BLEU: 0.422 | ||
| Val. Loss: 0.278 | Val. PPL: 1.321 | Val. BLEU: 0.430 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:48<00:00, 1.35it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.44it/s] | ||
| Epoch: 06 | Time: 2m 53s | ||
| Train Loss: 0.203 | Train PPL: 1.225 | Train BLEU: 0.429 | ||
| Val. Loss: 0.237 | Val. PPL: 1.267 | Val. BLEU: 0.433 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:58<00:00, 1.27it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.46it/s] | ||
| Epoch: 07 | Time: 3m 4s | ||
| Train Loss: 0.151 | Train PPL: 1.164 | Train BLEU: 0.434 | ||
| Val. Loss: 0.213 | Val. PPL: 1.238 | Val. BLEU: 0.434 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:49<00:00, 1.34it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.52it/s] | ||
| Epoch: 08 | Time: 2m 54s | ||
| Train Loss: 0.114 | Train PPL: 1.120 | Train BLEU: 0.437 | ||
| Val. Loss: 0.198 | Val. PPL: 1.218 | Val. BLEU: 0.434 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:53<00:00, 1.31it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.52it/s] | ||
| Epoch: 09 | Time: 2m 58s | ||
| Train Loss: 0.085 | Train PPL: 1.088 | Train BLEU: 0.441 | ||
| Val. Loss: 0.187 | Val. PPL: 1.205 | Val. BLEU: 0.435 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████| 227/227 [02:52<00:00, 1.31it/s] | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.49it/s] | ||
| Epoch: 10 | Time: 2m 58s | ||
| Train Loss: 0.062 | Train PPL: 1.064 | Train BLEU: 0.442 | ||
| Val. Loss: 0.182 | Val. PPL: 1.200 | Val. BLEU: 0.435 | ||
| 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.44it/s] | ||
| | Test Loss: 0.198 | Test PPL: 1.219 | Test BLEU: 0.420 | ||
| Saving model to mt_seq2seq.pt | ||
| Save vocab to torchtext_mt_vocab.pt | ||
| ``` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include a metric for the test/valid datasets with the trained model? See my comments about
bleu_scorebelow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better. However, I may not be able to run a full blown training as my resource is quite limited. Do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. I will find time and work on it this half. Then, I can update this. Just to make sure that you set up the model/training correctly by checking the learning curve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Sorry for the trouble 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
borrowed a resource to run 10 epochs, already putting the result on README. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a little bit too long. Should we just include the final test result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing the whole training metrics from the docs I trim it so that I only include the first and the last training output to give users some idea on the loss value while running the example