Solved COMP 370 Homework 8 – Data Annotation

$30.00

Original Work ?

Download Details:

  • Name: HW8-psilnk.zip
  • Type: zip
  • Size: 3.82 MB

Category: Tags: , , , , , You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

In this assignment, we’re interested in the main topics discussed on the /r/mcgill subreddit vs. the /r/concordia subreddit. We’ll do this using human annotation … and you’re the annotator

Task 1: Data collection

First, let’s collect some reddit posts (using the /new.json endpoint – details here). We’ll collect two data files. One from the McGill subreddit and one from the Concordia subreddit. For the purpose of this assignment, collect them manually. Meaning, in a web browser, get the json dump and download it to a file. You should have a a mcgill.json file and a concordia.json file.

Task 2: Prep for coding

Write a script extract_to_tsv.py that accepts one of the files you collected from Reddit and outputs a random selection of posts from that file to a tsv (tab separated value) file. It should function like this: python3 extract_to_tsv.py -o If is greater than the file length, then the script should just output all lines. If there are more than (which is likely the case), then it should randomly select num_posts_to_output (the parameter you passed to the script) of them and just output those. The output format (written to out_file) is: Name title coding