Ling 473 Assignment 3 solution

$25.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

1. (20 points) In Lecture 3, we looked at the outcomes of rolling two fair dice. For this problem, we will
consider weighted dice—one white, and one red. For each die, 1 and 6 are twice as likely to show as
the other four values.

a. What is the probability that the total showing on the two dice will be 7?
b. What is the probability that the total showing on the two dice will be 9 or higher?

c. What is the probability that the red die will show a higher number than the white one?

2. (35 points) The following is the first paragraph of Ernest Hemmingway’s The Old Man and The Sea.
It has been POS-tagged using the online Brill tagger at the Center for Sprogteknologi at Københavns
Universitet. A few minor changes have been applied.

This assignment does not require programming, but if you wish to work with an electronic version of
this information, you can refer to the following file:
/opt/dropbox/16-17/473/assignment3/old-man.txt

a. How many bigrams does the sample contain?

b. In a bigram model, we assume that a POS tag depends only on the POS tag of the preceding
word. Calculate 𝑃(. | NN), assuming that the counts in the above sample are perfectly
representative.

c. We are interested in the probability of the bigram DT JJ in the sample text. What is the value of
𝑃(DT JJ)?

d. A trigram model predicates a POS tag on the POS tags of the preceding bigram. Calculate
𝑃(NN | DT JJ) for the sample.

e. Assume this sample characterizes a larger corpus. Assume that measured probabilities are
independent. Estimate 𝑃(DT JJ | NN) for the corpus. (Hint: this will use Bayes’ Theorem.)
Show your work.

3. (15 points) For phonetic elicitation with a group of American test subjects, we are using three word
lists:
A = { gnat, beet }
B = { loon, fee }
C = { peel, pool, he, sand }

The test protocol is as follows: One of the lists is selected at random. Then, the subject is asked to
pronounce a randomly selected word from that list. What is the probability that the word will have a
high/close vowel (as opposed to low/open)? If you are not familiar with vowel phonetics, you can check
the Lecture 5 recording, or listen to samples on http://en.wikipedia.org/wiki/Vowel.

4. (30 points) A classifier has portioned a set of eight biomedical documents into
𝐶 = { mentions the IL-2R ⍺-promoter } (6 documents), and
𝐶̅ (the rest).

The gold standard indicates that only three documents actually mention the Interleukin-2 receptor alpha
promoter, and we determine that exactly one of them is (incorrectly) in 𝐶̅. In testing a post-processing
heuristic, we select a document at random from 𝐶 and move it in the class 𝐶̅. Next, we randomly select
a document from 𝐶̅.

a. What is the probability that the document we selected from 𝐶̅mentions the IL-2R ⍺-promoter
(according to the gold standard)?

b. Next, we note that the document we selected from 𝐶̅does, in fact (according to the gold
standard), mention the IL-2R ⍺-promoter. Given this additional information, what is the
probability that the document that we transferred from 𝐶 to 𝐶̅mentioned (according to the
gold standard) the IL-2R ⍺-promoter (i.e., that we moved it to the wrong class)?