COMS W4721: Machine Learning for Data Science Homework 5 solution

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (3 votes)

Problem 1 (Markov chains) – 30 points
In this problem, you will rank 760 college football teams based on the scores of every game in the 2016
season. The data provided in CFB2016 scores.csv contains the result of one game on each line in
the format
Team A index, Team A points, Team B index, Team B points.
If Team A has more points than Team B, then Team A wins, and vice versa. The index of a team refers
to the row of “TeamNames.txt” where that team’s name can be found.
Construct a 760×760 random walk matrix M on the college football teams. First construct the unnormalized matrix Mc with values initialized to zeros. For one particular game, let i be the index of Team A
and j the index of Team B. Then update
Mcii ← Mcii + 1{Team A wins} +
pointsi
pointsi+pointsj
,
Mcjj ← Mcjj + 1{Team B wins} +
pointsj
pointsi+pointsj
,
Mcij ← Mcij + 1{Team B wins} +
pointsj
pointsi+pointsj
,
Mcji ← Mcji + 1{Team A wins} +
pointsi
pointsi+pointsj
.
After processing all games, let M be the matrix formed by normalizing the rows of Mc so they sum to
one. Let wt be the 1×760 state vector at step t. Set w0 to the uniform distribution. Therefore, wt
is
the marginal distribution on each state after t steps given that the starting state is chosen uniformly at
random.
a) Use wt
to rank the teams by sorting in decreasing value according to this vector. List the top 25
teams and their corresponding values in wt for t = 10, 100, 1000, 10000.
b) We saw that w∞ is related to the first eigenvector of MT
. That is, we can find w∞ by getting the
first eigenvector and eigenvalue of MT
and post-processing:
MT u1 = λ1u1, w∞ = u
T
1 /
hP
j
u1(j)
i
This is because u
T
1 u1 = 1 by convention. Also, we observe that λ1 = 1 for this specific matrix.
Plot kwt − w∞k1 as a function of t for t = 1, . . . , 10000.
Problem 2 (Nonnegative matrix factorization) – 30 points
In this problem you will factorize an N × M matrix X into a rank-K approximation W H, where W is
N × K, H is K × M and all values in the matrices are nonnegative. Each value in W and H can be
initialized randomly to a positive number, e.g., from a Uniform(1,2) distribution.
The data to be used for this problem consists of 8447 documents from The New York Times. (See below
for how to process the data.) The vocabulary size is 3012 words. You will need to use this data to
construct the matrix X, where Xij is the number of times word i appears in document j. Therefore, X
is 3012×8447 and most values in X will equal zero.
a) Implement and run the NMF algorithm on this data using the divergence penalty. Set the rank
to 25 and run for 100 iterations. This corresponds to learning 25 topics. Plot the objective as a
function of iteration.
b) After running the algorithm, normalize the columns of W so they sum to one. For each column of
W, list the 10 words having the largest weight and show the weight. The ith row of W corresponds
to the ith word in the “dictionary” provided with the data. Organize these lists in a 5 × 5 table.
Comments about Problem 2:
• When dividing, you may get 0/0 = NaN. This will cause the algorithm to return all NaN values.
You can add a very small number (e.g., 10−16) to the denominator to avoid this.
• Each row in nyt data.txt corresponds to a single document. It gives the index of words
appearing in that document and the number of times they appear. It uses the format “idx:cnt” with
commas separating each unique word in the document. Any index that doesn’t appear in a row has
a count of zero for that word. The vocabulary word corresponding to each index is given in the
corresponding row of nyt vocab.dat.