LIN570: HW2 – sbd solution

$25.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

For this homework, you are going to perform sentence boundary detection and evaluate its
results using a F1 score. You will also evaluate tokenization results using a F1 score. All the sample
files are under ~/dropbox/19-20/570/hw2/examples.

Rubric:
2pts hw.tar.gz submitted, it should contain following files:
• file.sbd.system
• file.tok.system
• f1 score sbd.sh
• f1 score tok.sh
• file.sbd.score
• file.tok.score

2pts readme.txt or readme.pdf submitted
6pts All files and folders are present in expected locations
10pts Programs run to completion
5pts The output of programs on patas match submitted output

1. (10pts) using splitta1 obtain SBD results:
• python2 sbd.py file.txt > file.sbd.system

2. (40pts) Implement a script to calculate a F1 score f1 score sbd.sh for sentence boundary
detection
• The command line is: cat file.sbd.system | ./f1 score sbd.sh file.sbd.gold >
file.sbd.score
• Minimum in-line comments should be provided.

3. (25pts) Modify your script to calucate a F1 score f1 score tok.sh for tokenization
• The command line is:
– cat file.sbd.system | ./eng tokenizer.sh abbrev list > file.tok.system
(eng tokenizer.sh and abbrev list are from HW1)
1https://github.com/lukeorland/splitta

– cat file.tok.system | ./f1 score tok.sh file.tok.gold > file.tok.score
• Minimum in-line comments should be provided.
system gold
The S-SENT S-SENT ← tp
luxury O O
… … …
year O O
sold O O
1,214 O O
cars O O
in O O
the O O
U.S. S-SENT O ← fp
Howard O S-SENT ← fn
Mosher O O
,

• See also conlleval2
for the F1 score used at the CoNLL-2000 shared task data (Chunking).

• parentheses, brackets, etc in the Penn treebank (file.tok.gold):
# s/(/-LRB-/g
# s/)/-RRB-/g
# s/\[/-LSB-/g
# s/\]/-RSB-/g
# s/{/-LCB-/g
# s/}/-RCB-/g

• – raw:
“From the beginning, it took a man with extraordinary qualities to succeed in Mexico,” ..
– tokenized:
‘‘ From the beginning , it took a man with extraordinary qualities to succeed in Mexico , ’2https://www.clips.uantwerpen.be/conll2000/chunking/output.html