Data Science Professional Practicum (DSCI 560) Laboratory Assignment 1 solution

$25.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

This is the first assignment of the course that focuses on helping you get your systems set up, finish required installations, and run a few basic tasks.

We will use virtual machines (VM) or containers with Linux as the operating system and Python as the programming language for the entire course.

In this assignment, you will install and set up the Ubuntu Linux VM and install the required software packages for Python. You will run a few tasks on Linux Terminal and Python tasks to get hands-on experience. You are expected to submit a document with snapshots and a few lines of description for each task performed as a part of the assignment. Please ensure that the script files you submit have proper comments. This lab must be completed individually.

1. Installation and Setup 1.1. Install VirtualBox/VMware VirtualBox runs on Windows and Linux while Vmware works on Windows, Linux, and MacOS ● Follow the instructions provided on https://virtualbox.org ● Follow the instructions provided on https://viterbiit.usc.edu/services/software/vmware-academic-program/

1.2. Download Ubuntu ISO Image We would be using Ubuntu throughout the semester. Follow the instructions below to download the ubuntu image and setup the virtual machine. ● Go to the official Ubuntu website and download the Ubuntu Desktop ISO image. ● Open VirtualBox or Vmware ● Create a new virtual machine. ● Name the virtual machine, select “Linux” as the type, and “Ubuntu (64-bit)” as the version. ● Allocate memory (RAM) for the virtual machine (at least 2GB is recommended). ● Allocate disk space for the virtual machine (at least 20GB is recommended). ● Create the virtual machine.

2 ● When prompted, choose the Ubuntu ISO image you downloaded earlier as the installation medium. ● The Ubuntu installer will launch. Follow the installation wizard to install Ubuntu on the virtual machine.

1.3. Install Python on Linux After you have installed and started the Linux VM, we install Python and some python packages. Here’s how you can do that: ● Open a terminal in your Ubuntu virtual machine. ● Update the packages list using: sudo apt update ● Ubuntu usually comes with Python 3 pre-installed. To ensure it is installed, run the following command: sudo apt install python3 ● To verify that Python3 has been installed successfully, run: python 3 —-version ● Install pip (Python Package Manager) – pip is a package manager for Python that allows you to easily install Python packages from the Python Package Index (PyPI): sudo apt install python3-pip

● To verify that pip has been installed successfully, run: pip3 –version 1.4. Tutorials If you are new to Linux or Python, spend some time reading the below documentation and tutorials to understand the basic concepts and commands. Linux: ● Start with the “Introduction to Linux” tutorial by Linux Journey. This comprehensive tutorial covers basic Linux commands and concepts. ● Refer to the Linux Command Cheat Sheet for quick access to essential Linux commands. Python: ● Begin with the official Python website’s tutorial, which provides a solid foundation in Python programming. ● For practical examples and exercises, follow “Python for Beginners” by Real Python.

2. Get Familiar with Linux and Python

In this section, we perform a few different tasks using both the AWS Management Console and the CLI to provide you with hands-on experience of how to set up services and tasks on AWS and get you prepared for the upcoming assignments.

2.1. Playing around with Linux Terminal ● Open the Linux terminal. ● Create a new directory named “_” on the desktop. ● Inside the folder, create two subdirectories named “data” and “scripts” ● Create an empty Python file inside the scripts folder named “task_1.py” ● Use the list command to view the created script file.

2.2. A basic Python Script ● Open the task_1.py python, you created in the previous step using vim / nano. ● Write a Python script that reads a user’s name as input and greets the user with “Hello, [name]!”. ● The script should prompt the user for input and display the greeting in the terminal. ● Save and exit the editor. ● Run the python code. ● Feel free to complete the python tutorials and tasks till you feel comfortable using the editors and get accustomed with the python syntax before moving to the next step.

2.3. Python Web-scraping Task ● Create a new file “web_scraper.py” in the scripts folder. ● Install the required libraries (Requests and BeautifulSoup4) using pip: pip install requests beautifulsou4 ● Open this website https://www.cnbc.com/world/?region=world ● Analyze the HTML structure by inspecting the elements on the Page. ● Find the corresponding tags for the Market banner on the top and the Section titled Latest News on the page. ● Create two new folders in the “data” folder called “raw_data” and “processed_data” ● Write a Python script that uses Requests and BeautifulSoup to collect data from the provided link. ● Save the collected data in the “raw_data” folder to a file named “web_data.html” ● Using the terminal print the first 10 lines of the created html file on the terminal.

2.4. Data Filtering Task ● Write a python script called “data_filter.py”. Read the “web_data.html” file into a Python list, extracting specific elements of interest from the data.

4 ● Store the (marketCard_symbol, marketCard_stockPosition and marketCardchangePct) from the market banner and the (LatestNews-timestamp, title and link) for each entry in the LatestNews list. ● Store the market banner data in a CSV named “market_data.csv” in the processed_data folder. ● Store the new data in a CSV named “news_data.csv” in the processed_data folder. ● Print appropriate messages to the console. For example: Filtering fields, storing Market data, CSV created, etc.

3. Submission 3.1. Please submit all documents, answers to the questions, source codes, and report on Blackboard by the end of Saturday. Provide a document with all the snapshots and a brief description for each, and submit the document in PDF format (No other format would be considered). Please mention your Name and USC ID at the end of the document.

3.2. Upload a demo video to YouTube and submit the YouTube link by the end of Monday. This video should sufficiently demonstrate your results.

3.3. There will be a 50% penalty for all late submissions.