Homework 3: Comparing Search Engine Ranking Algorithms solution

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

Objectives:
o Experience using Solr
o Investigating ranking strategies
Preparation
In the previous exercise you used crawler4j to crawl a portion of the USC website. As a result of this
crawl you should have downloaded and saved HTML/PDF/DOC files. In this exercise you will index those
pages using Solr and then modify Solr to compare different ranking strategies.
Solr can be installed on Unix or Windows machines. However, it is much easier to run Solr on a Unix
computer. Therefore, we have provided instructions for installing an implementation of Unix, called
Ubuntu, on a Windows computer. For details see
https://www-scf.usc.edu/~csci572/Exercises/UbuntuVirtualBoxFinal.pdf
Once Ubuntu is successfully installed, or if you are using a Mac or some other Unix computer, you can
then follow the instructions for installing Solr directly, which can be found here
https://www-scf.usc.edu/~csci572/2016Spring/hw3/SolrInstallation.pdf
The above instructions are for solr-5.3.1.zip. You can either download the file from
https://lucene.apache.org/solr/downloads.html/
or from the class website at
https://www-scf.usc.edu/~csci572/software/solr-5.3.1-src.tgz
Once Solr is installed you need to have it index the web pages that you saved. Instructions for doing this
can be found here
https://www-scf.usc.edu/~csci572/2016Spring/hw3/IndexingwithTIKAV3.pdf
Solr provides a simple user interface that you can use to explore your indexed web pages.
Description of the Exercise
Step 1
Now that your set up is complete you need to have access to a web server that can deliver web pages
and run scripts. Using this web server you will create a web page with a text box which a user can
retrieve and then enter a query. The user’s query will be processed by a program at your web server
which formats the query and sends it to Solr. Solr will process the query and return some results in JSON
format. A program on your web server will re-format the results and present them to the user as any
search engine would do.
2
Below is a rough outline for how you could structure your solution to the exercise. All three elements:
web browser, web server, and Solr would be located on your laptop. Your web server might be the
Apache web server coupled with the PhP programming language. An alternative solution would be to
use node.js as the server/programming component. In the case of node.js, the programming language is
JavaScript. Whatever you use, your program would send the query web page to the user, and then send
the user’s query to Solr which produces the results. The results are returned by Solr to the same web
server and converts the results into a nice looking web page that is eventually returned.
Solr server supports several clients (different languages). Clients use requests to ask Solr to do things like
perform queries or index documents. Client applications can reach Solr by creating HTTP requests and
parsing the HTTP responses. Client APIs encapsulate much of the work of sending requests and parsing
responses, which makes it much easier to write client applications.
Clients use Solr’s five fundamental operations to work with Solr. The operations are query, index, delete,
commit, and optimize. Queries are executed by creating a URL that contains all the query parameters.
Solr examines the request URL, performs the query, and returns the results. The other operations are
similar, although in certain cases the HTTP request is a POST operation and contains information beyond
whatever is included in the request URL. An index operation, for example, may contain a document in
the body of the request.
There are several client APIs available for Solr, refer https://wiki.apache.org/solr/IntegratingSolr . As an
example, here we explain how to create a PHP client that accepts input from the user in a HTML form,
and sends the request to the Solr server. After the Solr server processes the query, it returns the results
which are parsed by the PHP program and formatted for display.
We are using the solr-php-client which is available here https://github.com/PTCInc/solr-php-client .
Clone this repository on your computer in the folder where you are developing the User Interface. (git
clone https://github.com/PTCInc/solr-php-client.git). Below is the sample code from the wiki of this
repository.