Description
1 RNN’s (Recursive Neural Network)
Welcome to SAIL (Stanford Articial Intelligence Lab): Congrats! You have just been given a
Research Assistantship in SAIL. Your task is to discover the power of Recursive Neural Networks (RNNs).
So you plan an experiment to show their eectiveness on Positive/Negative Sentiment Analysis. In this part,
you will derive the forward and backpropogation equations, implement them, and test the results.
Our RNN has one ReLU layer and one softmax layer, and uses Cross Entropy loss as its cost function.
We follow the parse tree given from the leaf nodes up to the top of the tree and evaluate the cost at each
node. During backprop, we follow the exact opposite path. Figure 1 shows an example of such a RNN
applied to a simple sentence “I love this assignment”. These equations are sucient to explain our model:
CE(y; ^y) = ?
X
i
yi log( ^ yi)
where y is the label represented as a one-hot row vector, and ^y is a row vector containing the predicted
probabilities for all classes. In our case, y 2 R15 and ^y 2 R15 to represent our 5 sentiment classes: Really
Negative, Negative, Neutral, Positive, and Really Positive. Furthermore,
h(1) = max(
h
h(1)
Left; h(1)
Right
i
W(1) + b(1); 0)
^y = softmax(h(1)U + b(s))
where h(1)
Left is the vector representation of the left subtree (possibly be a word vector), the h(1)
Right of the
right subtree. For clarity, L 2 RjV jd, W(1) 2 R2dd, b(1) 2 R1d, U 2 Rd5, b(s) 2 R15.
(a) (20 points) Follow the example parse tree in Figure 1 in which we are given a parse tree and truth labels
y for each node. Starting with Node 1, then to Node 2, nishing with Node 3, write the update rules for
W(1),b(1), U, b(s), and L after the evaluation of ^y against our truth, y. This means for at each node, we
evaluate:
1
CS 224d: Assignment #3
Figure 1: RNN (Recursive Neural Network) example
3 = ^y ? y
as our rst error vector and we backpropogate that error through the network, aggregating gradient at
each node for:
@J
@U
@J
@b(s)
@J
@W(1)
@J
@b(1)
@J
@Li
Points will be deducted if you do not express the derivative of activation functions (ReLU) in terms of
their function values (as with Assignment 1 and 2) or do not express the gradients by using an \error
vector” (i) propagated back to each layer. Tip on notation: below and above should be used for error
that is being sent down to the next node, or came from an above node. This will help you think about
the problem in the right way. Note you should not be updating gradients for Li in Node 1. But error
should be leaving Node 1 for sure!
(b) (80 points) Implementation time! We have simplied the problem to reduce training time by binarizing
the sentiment labels. This means that all the sentences are either positive or negative. The internal
nodes however, can be positive negative or neutral. While training, the cost function includes predictions
over all nodes that have a sentiment associated with them and ignores the neutral nodes. While testing,
we are only interested in the performance at the full sentence level. This is all provided for you in the
starter code.
Page 2 of 3
CS 224d: Assignment #3
(a) Download, unzip, and have a look at the code base.
(b) From the command line, run ./setup.sh to download the labeled parse tree dataset.
(c) Begin implementing the model in rnn.py by lling out the outline.
(d) Run the model and show us the resulting loss plot, nal training accuracy, nal validation
accuracy. You should be able to achieve an accuracy of 75%.
(e) (10 points extra credit) Tune the parameters of the model to improve performance and report
your new loss plot, nal training accuracy, nal validation accuracy. Explain why the changes you
made helped achieve the improvement.
(f) Ensure that your code is using the best model cong and the trained weights are present in the
weights folder. Run ./prepare submission.sh and submit the resulting zip le.
Page 3 of 3