mini seq2seq

Minimal Seq2Seq model with attention for neural machine translation in PyTorch.

This implementation focuses on the following features:

Dataset (Multi30k DE→EN) is loaded via HuggingFace datasets; tokenization uses spaCy.

Model description

Encoder: Bidirectional GRU
Decoder: GRU with Attention Mechanism
Attention: Neural Machine Translation by Jointly Learning to Align and Translate

pip install -r requirements.txt
python -m spacy download de_core_news_sm
python -m spacy download en_core_web_sm

python train.py -epochs 30 -batch_size 32 -lr 3e-4

Device is auto-detected (CUDA → MPS → CPU). Smaller -hidden_size / -embed_size flags are useful for CPU smoke runs.

Sanity check (CPU, 500 batches, hidden=128/embed=64):

Final val loss: 4.93 (random-init prior is log(|V|) ≈ 9.19).

Based on the following implementations