## Description

1 RNN’s (Recursive Neural Network)

Welcome to SAIL (Stanford Articial Intelligence Lab): Congrats! You have just been given a

Research Assistantship in SAIL. Your task is to discover the power of Recursive Neural Networks (RNNs).

So you plan an experiment to show their eectiveness on Positive/Negative Sentiment Analysis. In this part,

you will derive the forward and backpropogation equations, implement them, and test the results.

Our RNN has one ReLU layer and one softmax layer, and uses Cross Entropy loss as its cost function.

We follow the parse tree given from the leaf nodes up to the top of the tree and evaluate the cost at each

node. During backprop, we follow the exact opposite path. Figure 1 shows an example of such a RNN

applied to a simple sentence “I love this assignment”. These equations are sucient to explain our model:

CE(y; ^y) = ?

X

i

yi log( ^ yi)

where y is the label represented as a one-hot row vector, and ^y is a row vector containing the predicted

probabilities for all classes. In our case, y 2 R15 and ^y 2 R15 to represent our 5 sentiment classes: Really

Negative, Negative, Neutral, Positive, and Really Positive. Furthermore,

h(1) = max(

h

h(1)

Left; h(1)

Right

i

W(1) + b(1); 0)

^y = softmax(h(1)U + b(s))

where h(1)

Left is the vector representation of the left subtree (possibly be a word vector), the h(1)

Right of the

right subtree. For clarity, L 2 RjV jd, W(1) 2 R2dd, b(1) 2 R1d, U 2 Rd5, b(s) 2 R15.

(a) (20 points) Follow the example parse tree in Figure 1 in which we are given a parse tree and truth labels

y for each node. Starting with Node 1, then to Node 2, nishing with Node 3, write the update rules for

W(1),b(1), U, b(s), and L after the evaluation of ^y against our truth, y. This means for at each node, we

evaluate:

1

CS 224d: Assignment #3

Figure 1: RNN (Recursive Neural Network) example

3 = ^y ? y

as our rst error vector and we backpropogate that error through the network, aggregating gradient at

each node for:

@J

@U

@J

@b(s)

@J

@W(1)

@J

@b(1)

@J

@Li

Points will be deducted if you do not express the derivative of activation functions (ReLU) in terms of

their function values (as with Assignment 1 and 2) or do not express the gradients by using an \error

vector” (i) propagated back to each layer. Tip on notation: below and above should be used for error

that is being sent down to the next node, or came from an above node. This will help you think about

the problem in the right way. Note you should not be updating gradients for Li in Node 1. But error

should be leaving Node 1 for sure!

(b) (80 points) Implementation time! We have simplied the problem to reduce training time by binarizing

the sentiment labels. This means that all the sentences are either positive or negative. The internal

nodes however, can be positive negative or neutral. While training, the cost function includes predictions

over all nodes that have a sentiment associated with them and ignores the neutral nodes. While testing,

we are only interested in the performance at the full sentence level. This is all provided for you in the

starter code.

Page 2 of 3

CS 224d: Assignment #3

(a) Download, unzip, and have a look at the code base.

(b) From the command line, run ./setup.sh to download the labeled parse tree dataset.

(c) Begin implementing the model in rnn.py by lling out the outline.

(d) Run the model and show us the resulting loss plot, nal training accuracy, nal validation

accuracy. You should be able to achieve an accuracy of 75%.

(e) (10 points extra credit) Tune the parameters of the model to improve performance and report

your new loss plot, nal training accuracy, nal validation accuracy. Explain why the changes you

made helped achieve the improvement.

(f) Ensure that your code is using the best model cong and the trained weights are present in the

weights folder. Run ./prepare submission.sh and submit the resulting zip le.

Page 3 of 3