LIN570: HW11 – mt solution

$25.00

Original Work ?

Download Details:

  • Name: hw11-qpaixg.zip
  • Type: zip
  • Size: 743.28 KB

Category: You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

1. Q1 (40 points)
(a) (10 points): prepare your data including removing xml tags, senetence boundary detection, tokenization. See tools at https://www.statmt.org/europarl/v7/tools.tgz
for preprocessing.
• your preprocessed files will be: en-ep-99-12-17.tok.txt and de-ep-99-12-17.tok.txt

(b) (30 points): implement a sentence aligner using the Gale and Church algorithm:
• ./setence aligner.sh de-ep-99-12-17.tok.txt en-ep-99-12-17.tok.txt > de-en-aligned.txt
• cut -f1 de-en-aligned.txt > de-en-aligned.txt.de
• cut -f2 de-en-aligned.txt > de-en-aligned.txt.en

2. Q2 (40 points):
(a) (10 points) discuss how to evaluate sentence aligned results intrinsically (recall evaluation
on sentence boundary detection).
(b) (30 points) implement eval sentence alignment.sh.
• ./eval sentence alignment.sh ep-99-12-17-de-en.de ep-99-12-17-de-en.en
de-en-aligned.txt.de de-en-aligned.txt.en

3. Q2 (20 points): show the MLE probability parameters (M-step) by normalizing the counts
to sum to 1 (i.e., t(f|e) = count(f|e)
total(e)
) after the second iteration: (See MT slides)
t(maison|green) = t(vert|green) = t(la|green) =
t(maison|house) = t(vert|house) = t(la|house) =
t(maison|the) = t(vert|the) = t(la|the) =

The submission should include:
• The readme.[txt|pdf] file includes answers for Q2a and Q3.
• hw.tar.gz includes
– setence aligner.sh
– de-en-aligned.txt.en
– de-en-aligned.txt.de
– eval sentence alignment.sh

🚀 Custom Work, Just for You! 🚀

Can’t find the tutorial you need? No worries! We create custom, original work at affordable prices! We specialize in Computer Science, Software, Mechanical, and Electrical Engineering, as well as Health Sciences, Statistics, Discrete Math, Social Sciences, Law, and English.

✍️ Custom/Original Work Essays cost as low as $10 per page.
💻 Programming Custom Work starts from $50.

🔥 Get top-quality help now! 🔥

Get Your Custom Work