Description
1 Image Captioning with Transformers (70 points)
We will be implementing the different pieces of a Transformer decoder (Transformers), and train it for image
captioning on a subset of the COCO dataset.
• Setup: Run the following command to extract COCO data, in the transformer captioning/datasets
folder : ./get coco captioning.sh
• Question: Follow the instructions in the README.md file in the transformer captioning
folder to complete the implementation of the transformer decoder.
• Deliverables: After implementing all parts, use run.py for training the full model. The code will log
plots to plots. Extract plots and paste them into the appropriate section below.
• Expected results: These are expected training losses after 100 epochs. Do not change the seed in
run.py.
– 2-heads, 2-layers, lr 1e-4: Final loss ≤ 1
– 4-heads, 6-layers, lr 1e-4: Final loss ≤ 0.3
– 4-heads, 6-layers, lr 1e-3: Final loss ≤ 0.05
1. Paste training loss plots for each of the three hyper-param configs
2-heads-2-layers-lr-1e-4: TODO: fill in final train loss here.
4-heads-6-layers-lr-1e-4: TODO: fill in final train loss here.
4-heads-6-layers-lr-1e-3: TODO: fill in final train loss here.
Image
(a) 2-heads-2-layers-lre-4
Image
(b) 4-heads-6-layers-lre-4
Image
(c) 4-heads-6-layers-lre-3
2. Paste any three generated captioning samples from the training set. The provided code creates these
plots at the end of training.
Image
(a) Sample1
Image
(b) Sample2
Image
(c) Sample3
2 of 4
Homework 3: Transformers 16824
2 Classification with Vision Transformers (30 points)
We will use the transformer you implemented in the previous part to implement a Vision Transformer (ViT),
for classification on CIFAR10.
• Question: Follow the instructions in the README.md file in the vit classification folder.
You are encouraged to resuse code from the previous question.
• Deliverables: Run training using run.py for training the full model. The code will log plots
acc out.png (train and test accuracy) and loss out.png (train loss).
• Expected Results: After 100 epochs, test accuracy should be ≈ 68%, train accuracy should be
≈ 100%, and training loss ≈ 0.25.
Image
(a) Train/test accuracy
Image
(b) Training loss
3 of 4
Homework 3: Transformers 16824
Collaboration Survey Please answer the following:
1. Did you receive any help whatsoever from anyone in solving this assignment?
⃝ Yes
⃝ No
• If you answered ‘Yes’, give full details:
• (e.g. “Jane Doe explained to me what is asked in Question 3.4”)
2. Did you give any help whatsoever to anyone in solving this assignment?
⃝ Yes
⃝ No
• If you answered ‘Yes’, give full details:
• (e.g. “I pointed Joe Smith to section 2.3 since he didn’t know how to proceed with Question 2”)
3. Note that copying code or writeup even from a collaborator or anywhere on the internet violates the
Academic Integrity Code of Conduct.
4 of 4


