## Description

Q1 (25 points): Let X and Y be two random variables. The values for P(X,Y) are shown in Table

1, H(X) is the entropy of X, and MI(X,Y) is the mutual information of X and Y. Please write down

the formulas and the results for the following.

(a) 1 pt: P(X)

(b) 1 pt: P(Y )

(c) 2 pt: P(X | Y )

(d) 2 pt: P(Y | X)

(e) 2 pts: Are X and Y independent? Why or why not?

(f) 2 pts: H(X)

(g) 2 pts: H(Y )

(h) 2 pts: H(X, Y )

(i) 2 pts: H(X | Y )

(j) 2 pts: H(Y | X)

(k) 2 pts: MI(X, Y )

(l) 5 pts: The value for Q(X, Y ) are shown in Table 2. What is the value for KL(P(X, Y ) ||

Q(X, Y ))? What is the value for KL(Q(X, Y ) || P(X, Y ))? Are they the same?

Table 1: The joint probability P(X,Y)

X=1 X=2 X=3

Y=a 0.10 0.20 0.30

Y=b 0.05 0.15 0.20

Table 2: The joint probability Q(X,Y)

X=1 X=2 X=3

Y=a 0.10 0.20 0.40

Y=b 0.01 0.09 0.20

Q2 (10 points): Let X be a random variable for the result of tossing a coin. P(X = h) = p; that is,

p is the possibility of getting a head, and 1 − p is the possibility of getting a tail.

(a) 1 pt: H(X) is the entropy of X. Write down the formula for H(X).

(b) 2 pts: Let p

∗ = arg maxp H(X); that is, p

∗

is the p that results in the maximal value of H(X).

What is p

∗

?

(c) 7 pts: Prove that the answer you give in (b) is correct. Hint: recall how you calculate the optimal

solution for a function f(x) in your calculus class. In this case, H(X) is a function of p.

Q3 (25 points): Permutations and combinations:

(a) 6 pts: The class has n students, and n is an even number. The students are forming teams to

work on their homework. Each team has exactly 2 students and each student has to appear in

exactly one team. How many distinct ways are there to form the teams for the class? Write

down the formula. Hint: when n=4, there are 3 ways. For instance, if student #1 and #2 are

in the same team, students #3 and #4 would have to be in the same team too.

(b) 5 pts: There are 10 balls: 5 are red, 3 are blue, and 2 are white. Suppose you put the balls in a

line, how many different color sequences are there?

(c) 14 pts: Suppose you want to create a document of length N by using only the words in a vocabulary Σ = {w1, w2, …, wn}. Let [t1, t2, …, tn] be a list of non-negative integers such that

P

i

ti = N.

(c1) 7 points: How many different documents are there which satisfy the condition that, for

each wi

in the vocabulary Σ, the occurrence of the word wi

in the document is exactly ti?

That is, how many different word sequences are there which contain exactly ti wi

’s for each

wi

in Σ?

Hint: The answer to (c1) is very similar to the answer to (b).

(c2) 7 pts: Let P(X) be a unigram model on the vocabulary Σ; that is, P(X = wi) is the

probability of a word wi

, and P

wi∈Σ P(X = wi) = 1.

Suppose a document of length N is created with the following procedure: for each position

in a document, you pick a word from the vocabulary according to P(X); that is, the probability of picking wi

is P(X = wi). What is the probability that you will end up with a

document where the occurrence of the word wi (for each wi ∈ Σ) in the document is exactly

ti?

Hint: As (c1) shows, there will be many documents that contain exactly ti wi

’s. The answer

to (c2) should be the sum of the probabilities of all these documents.

Q4 (10 points): Suppose you want to build a trigram POS tagger. Let T be the size of the tagset

and V be the size of the vocabulary.

(a) 2 pts: Write down the formula for calculating P(w1, …, wn, t1, …, tn), where wi

is the i-th word

in a sentence, and ti

is the POS tag for wi

.

(b) 8 pts: Suppose you will use an HMM package to implement a trigram POS tagger.

• What does each state in HMM correspond to? How many states are there?

• What probabilities in the formula for (a) do transition probability aij and emission probability bjk correspond to? ai,j is the transition probability from state si to sj , and bjk is the

probability that State sj emits symbol ok.

Q5 (10 points): In a POS tagging task, let V be the size of the vocabulary (i.e., the number of

words), and T be the size of the tagset. Suppose we want to build a classifier that predicts the tag of

the current word by using the following features:

1. Previous word w−1

2. Current word w0

3. Next word w+1

4. Surrounding words w−1 w+1

5. Previous tag t−1

6. Previous two tags t−2 t−1

(a) 3 pts: How many unique features are there in total? You just need to give the answer in the

Big-O notation (e.g., O(V

3

)).

(b) 2 pts: A classifier predicts class label y given the input x. In this task, what is x? what is y?

(c) 5 pts: For the sentence Mike/NN likes/VBP cats/NNS, write down the feature vector for

each word in the sentence. The feature vector has the format “InstanceName classLabel featName1 val1 featName2 val2 ….”. For the instanceName, just use the current word.

Q6 (10 points): Suppose you want to build a language identifier (LangID) that determines the

language code of a given document. The training data is a set of documents with the language code

for each document specified. The test data is a set of documents, and your LangID needs to determine

the language code of each document.

(a) 7 pts: How do you plan to build the LangID system? For instance, if you want to treat this as

a classification problem, what would x (the input) be? what would y (the output) be? What

would be good features? Name at least five types of features (e.g., one feature type is the word

unigrams in the document).

(b) 3 pts: What factors (e.g., the amount of training data) could affect the system performance?

Name at least three factors, excluding the amount of training data.

Q7 (10 “free” points): If you are not familiar with Mallet, please go over the Mallet slides at the

course website1

Set up the package in your patas environment, run some experiments. We will use Mallet in later

assignments. If you do not have a patas account, you should contact me right away.

.The third link at https://canvas.uw.edu/courses/1257221/pages/slides-from-prior-ling570

Submission: In your submission, include the following:

• readme.(txt|pdf) that includes your answers to Q1-Q6. No need to submit anything for Q7.

• Since this assignment does not require programming, there is no need to submit hw.tar.gz, and

no need to run check hwX.sh script.