z534 Assignment 2: Retrieval Algorithm and Evaluation solved

$24.99

Original Work ?

Download Details:

  • Name: z534-a2.zip
  • Type: zip
  • Size: 533.31 KB

Category: You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (4 votes)

Task 1: Implement your first search algorithm Based on the Lucene index, we can start to design and implement efficient retrieval algorithms. Let’s start from the easy ones. Please implement the following ranking function using the Lucene index we provided through Canvas (index.zip): ??,???= ?(?,???) ?????ℎ(???)∙log (1+? ??)! ∈! , where q is the user query, ??? is the target (candidate document in AP89), ? is the query term, ?(?,???) is the count of term ? in document ???, N is total number of documents in AP89, and ?? is the total number of documents that have the term ?. Please use Lucene API to get the information. From retrieval viewpoint, !(!,!”#) !”#$%!(!”#) is called normalized TF (term frequency) , while log (1+! !!) is IDF (inverse document frequency). The following code (using Lucene API) can be useful to help you implement the ranking function: // Get the preprocessed query terms Analyzer analyzer = new StandardAnalyzer(); QueryParser parser = new QueryParser(“TEXT”, analyzer ); Query query = parser.parse(queryString); Set IndexReader reader = DirectoryReader . open (FSDirectory . open (new File(pathToIndex))); //Use DefaultSimilarity.decodeNormValue(…) to decode normalized document length DefaultSimilarity dSimi=new DefaultSimilarity(); //Get the segments of the index List } //Get the term frequency of “new” within each document containing it for <fieldTEXT</field DocsEnum de = MultiFields. getTermDocsEnum (leafContext.reader(), MultiFields. getLiveDocs (leafContext.reader()), “TEXT”, new BytesRef(“new”)); int doc; while ((doc = de.nextDoc()) != DocsEnum. NO_MORE_DOCS ) { System. out .println(“\”new\” occurs “+de.freq() + ” times in doc(” + (de.docID()+startDocNo)+”) for the field TEXT”); } } For each given query, your code should be able to 1. Parse the query using Standard Analyzer (Important: we need to use the SAME Analyzer that we used for indexing to parse the query), 2. Calculate the relevance score for each query term, and 3. Calculate the relevance score ??,???. The code for this task should be saved in a java class: easySearch.java Task 2: Test your search function with TREC topics Next, we will need to test the search performance with the TREC standardized topic collections. You can download the query test topics from Canvas (topics.51-100). In this collection, TREC provides a number of topics (total 50 topics), which can be employed as the candidate queries for search tasks. For example, one TREC topic is: In this task, you will need to use two different fields as queries: