CS 4740 Project 3 Sequence Tagging solved

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

1 Overview
In this project, you are to implement a model that identifies relevant information in a text and tags it with the appropriate label. Particularly, the task of this project is Named Entity Recognition (NER), in which the semantic property or category of tokens corresponding to a person, location, organization, etc. is annotated. You will mainly implement a Hidden Markov Model or experiment with a package/tookit that implements one or more of the sequence-tagging algorithms covered in class: HMMs, Maximum Entropy Markov Models (MEMMs) or even conditional random fields (CRFs). If you decide to use an existing package or toolkit rather than implement a sequence tagger from scratch, you are expected to perform more extensive experiments.
2 Task and Dataset
The NER task uses the IOB encoding scheme. Each token is associated with a label O if it is Outside the entity, label B-xxx if it is the head (i.e. Beginning) of entity xxx, and I-xxx if it is within (i.e. Inside) entity xxx but not the head of the entity. We will concentrate on four types of named entities: persons (PER), locations (LOC), organizations (ORG) and names of miscellaneous entities (MISC) that do not belong to the previous three groups. The training data is organized by sentences. Each sentence is associated with three lines, where the first line contains the tokens, the second line contains the corresponding Part-Of-Speech (POS) tags, and the third line contains the correct labels. For example:
The following bond was announced by lead manager Toronto Dominion . DT VBG NN VBD VBN IN NN NN NNP NNP . O O O O O O O O B-PER I-PER O
1
specifies an example sentence in the training data. “Toronto Dominion” is labeled as a PER (Person) entity in the sentence. All non-entities are labelled “O”. The test data only has words and POS tags. In order to evaluate your system’s performance, you will upload your predictions for the test set to Kaggle.
3 Implementation
You should implement either a Hidden Markov Model (HMM) or a Conditional Random Field (CRF) for this task. If you decide to use an existing package/toolkit rather than implement a sequence tagger from scratch, you are expected to include more extensive experiments.
1. If you decide to implement the sequence tagging algorithm yourself, we strongly suggest implementing HMMs rather than CRFs unless you have sufficient background knowledge. If you decide to try both, start with HMMs. You may use any programming language that you’d like and any preprocessing tools that you can find. It might be possible, for example, to use a toolkit just to extract n-gram information, but to write the HMM code by yourself. If you decide to implement a CRF, clarify how you will do this in the proposal.
2. If using an existing HMM/CRF toolkit or package to run your experiments, it is up to you to figure out how to use the packages. In both cases, you will need to think up front about what kind of experiments would be interesting to run given your choice of algorithm.
3. Similar to the previous project, it could be beneficial to reserve some portion of the training dataset for validation purposes. Describe the details of your experimental designs in the report.
4. Develop baseline systems to be compared with your own model. You are required to implement baseline systems for comparison. One simple option is to first build a lexicon of all named entities that appear in the training corpus, and identify during testing only those named entities that are part of the lexicon. Note that baseline systems should be compared to your system in the report.
4 Kaggle Competition
We will launch a Kaggle competition for the task. Your submission to Kaggle should be a csv file consisting of five lines and two columns. The first line is a fixed header, and each of the rest four lines corresponds to one of the four types of named entities. The first column is the type identifier (PER, LOC, ORG or MISC), and the second column is a list of entities (separated by single space) that you predict to be of that type. Each entity is specified by its starting and ending position (concatenated by a hypen). To make positions unambiguous,
2
we provide the position for each token in the test corpus. Suppose the input contains two sentences:
Renate Goetschl of Austria won the women ’s World Cup downhill race NNP NNP IN NNP VBD DT NNS POS NNP NNP RB NN 0 1 2 3 4 5 6 7 8 9 10 11 ZIFA vice-chairman Vincent Pamire said Grobbelaar would take charge for a match against Tanzania NNP NN NNP NNP VBD NNP MD VB NN IN DT NN IN NNP 12 13 14 15 16 17 18 19 20 21 22 23 24 25
then your output should look like this:
Type,Prediction PER,0-1 14-15 17-17 LOC,3-3 25-25 ORG,12-12 MISC,8-9
The standard measures to report for NER are recall, precision, and F1 score (also called F-measure) evaluated at the entity level (not at the token level). Precision is |C∩P| |P| and Recall is |C∩P| |C| and F1 is 2Prec×Recall Prec+Recall , where P and C are the sets of predicted and correct name entities respectively. When you upload your predictions you will see the evaluation results (mean F1 score of the four entity types) on half of the test data. You will be able to see your score on the other half of the test set after the submission deadline. You should include the final evaluation result in your final report before the Kaggle competition ends. Note that once the Kaggle competition closes, you cannot make additional submissions to Kaggle.
5 Extensions
Here are several extensions you can implement and experiment with. Doing at least one extension is mandatory. Having more than one extension may be counted as bonus points with respect to the degree of your implementation, experimental results and write-up. Note that all the extensions can be used in the Kaggle competition.
1. Experiment with different orders of n-gram-based features: will bigrams be adequate or will trigrams (or 4-grams, etc.) be better?
2. Experiment with different smoothing methods: what smoothing method will you employ? How do different smoothing methods affect the results?
3. If you’re using toolkits, you might compare one sequence-tagging method with another, and vary the parameters and/or feature sets to see how they affect the performance of different methods.
3
4. Implement a secondary sequence tagging system that is different from your primary implementation, e.g. MEMMs or CRFs if your primary implementation is HMMs.
6 Proposal
Describe your sequence-tagging system and implementation plan in 1 page. You should consider • Which model are you planning to implement? (If you try to implement CRF, briefly explain your preparation for understanding the model since we did not cover it in class.) • Explain the algorithmic key points of your model. Especially think about which are hidden variables and observed variables for our setting, and what are the corresponding model parameters. • For MEMMs or CRFs, brainstorm which features you would incorporate to learn emission probabilities. Support your design decisions based on the real examples given in the dataset. • State which extension you are planning to do. While you might end up implementing different extensions, it will help us to provide you feedback.
In addition to submitting a 1-page proposal document, you must also submit code for at least one baseline system and report its performance on the partial test set via Kaggle. Include details if you split the training set into training and validation parts.
7 Report
You should submit a short document (5-6 pages will suffice) that contains the following sections. (You can include additional sections if you wish.)
1. Sequence Tagging Model
(a) “Implementation Details” Make clear which sequence tagging method(s) that you selected. Make clear which parts were implemented from scratch vs. obtained via an existing package. Explain and motivate any design choices providing the intuition behind them. (b) “Pre-Processing” Explain and motivate the pre-processing steps you apply to the data. (c) “Experiments” Describe the motivations and methodology of the experiments that you ran. Clearly state what were your hypotheses and what were your expectations.
4
(d) “Results” Summarize the performance of your system and any variations that you experimented with on both the training/validation and test dataset. Note that you have to compare your own system to at least one other non-trivial baseline system. Put the results into clearly labeled tables or diagrams and include your observations and analysis. An error analysis is required – e.g. what sorts of errors occurred, why? When did the system work well, when did it fail and any ideas as to why? How might you improve the system? (e) “Competition Score” Include your team name and the screenshot of your best score from Kaggle.
2. Extensions Explain the extensions that you decided to do. Include the implementation details, the experiments, and the results similar to the previous section. If your extensions contribute to the competition score, analyze why they help. If not, try to explain why it was not useful.
3. Individual Member Contribution Briefly explain the contribution of an individual group member. You could report if working loads are unfairly distributed.
8 Grading Guide • (10 pts) Proposal, clarity and feasibility of your plan. • (40 pts) Design and implementation of the sequence-tagging system if you implement the models yourselves; Design, thoughtfulness and comprehensiveness of your experiments if you choose to use existing toolkits. • (35 pts) Report: clarity and quality of the report. • (10 pts) At least one extension: implementation, experiments and discussion. • (5 pts) Submission to Kaggle. (not optional!)
8.1 Things to avoid
Dont be ridiculously inefficient. You are not supposed to spend too much time optimizing your code, but it SHOULD NOT take forever either. Bigram Viterbi is O(sm2) where s is the length of the sentence and m is the number of tags. Your implementation should have similar efficiency.
5
9 What to Submit
9.1 Part One • Proposal (one-page pdf file) • Code and results for at least one baseline system. Include a brief description of the baseline. • Archive all of the above in a zip file, and upload it to CMS. (due Nov 2, 11:59pm) • Submit the hardcopy of your one-page proposal in class. (Nov 3)
9.2 Part Two • Source code with adequate comments, and executables (only include code that you wrote yourselves, DO NOT include code from existing toolkits/packages) • Prediction output (the file you submitted to Kaggle) • Your team name on Kaggle • Report (pdf file) • Archive all of the above in a zip file, and upload it to CMS. (due Nov 13, 11:59pm)