Data Wrangling in R MATH8050: Homework 2 solution

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (3 votes)

R Working Environment Please load all the packages used in the following R chunk before the function sessionInfo() # load packages sessionInfo() Total points on assignment: 10 (reproducibility) + 20 (Q1) + 35 (Q2) + 10 (Q3) + 25 (Q4) Reproducibility component: 10 points. 1. (20 pts total, equally weighted) The diamonds dataset a. Replicate the following scatter plot 1 D E F G H I J 0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000 0 50001000015000 0 1 2 3 4 5 Price Carat clarity I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF Diamond Price x Carat b. Replicate the following plot 0 5000 10000 0 5000 10000 15000 20000 price count clarity I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF Histogram of Prices by Clarity c. Replicate the following plot 2 Fair Good Very Good Premium Ideal 0e+00 2e+07 4e+07 6e+07 price cut cut Fair Good Very Good Premium Ideal Diamond Cut ~ Price d. For the diamonds dataset, replicate the following plot. Ideal Good Very Good Premium Fair Ideal Very Good Good Premium Fair Ideal Good Very Good Fair Very Good Ideal Premium Premium Fair Good Ideal Good Very Good Fair Premium Ideal Fair Good Very Good Premium Good Ideal Fair Very Good Premium color D E F G H I J Mean dimond price 2. (35 pts total, equally weighted) We use tydiverse package to generate various plots with the iris 3 dataset. a. For the iris dataset, replicate the following plot 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width Default (theme_grey) 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width theme_bw 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width theme_linedraw 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width theme_light 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width theme_dark 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width theme_minimal 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width theme_classic theme_void b. For the irish dataset, replicate the following plot. 4 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width Species setosa versicolor virginica IRIS c. Compute the mean Petal Length under each species and then replicate the following plot. Make sure that you only use the tidyverse package for this problem. 0 1 2 3 4 5 6 setosa versicolor virginica Species Petal Length d. Combine variables by species and then replicate the following plot. Make sure that you only use the tidyverse package for this problem. 5 0 2 4 6 setosa versicolor virginica Species Values Petal.Length Petal.Width Sepal.Length Sepal.Width e. Order the species according to the order virginica, setosa, and versicolor, and replicate the following plot. Make sure that you only use the tidyverse package for this problem. 5.006 3.428 1.462 0.246 5.936 2.77 4.26 1.326 6.588 2.974 5.552 2.026 0 2 4 6 versicolor virginica setosa Species Values Petal Length Petal Width Sepal Length Sepal Width f. Add small amount of random variation to the location of each point using geom_jitter and replicate the following boxplot, where each characteristics of species corresponds to a boxplot and these boxplots are grouped by species. 6 0 2 4 6 8 setosa versicolor virginica Species Values Petal Length Petal Width Sepal Length Sepal Width g. Generate the boxplots faceted for each species and replicate the following plot. 7 Sepal.Length Sepal.Width Petal.Length Petal.Width setosa versicolor virginica setosa versicolor virginica 0 2 4 6 8 0 2 4 6 8 Species Values 3. (10 pts total, equally weighted) Use the economics dataset from the ggplot2 package answer the following questions a. Replicate the following figure mentioned in Lecture 2 for the ggplot2 package 8 5 10 15 20 25 1970 1980 1990 2000 2010 date unemployment savings b. Replicate the following figure, where the date starts from the year 1990. 5 10 15 20 25 Jan 1990 Jan 1992 Jan 1994 Jan 1996 Jan 1998 Jan 2000 Jan 2002 Jan 2004 Jan 2006 Jan 2008 Jan 2010 Jan 2012 Jan 2014 Jan 2016 date unemployment savings 4. (25 pts total) Work with the GOES-R dataset mentioned in class a. (4pts) load the DMWC_G16.nc dataset in R, extract variables: wind_speed, wind_direction, lat, lon, 9 time, pressure, temperature, local_zenith_angle, solar_zenith_angle, DQF, save it into a data frame as shown below b. (8pts) Convert the data frame dat into an sf object named df, where only observations with DQF equal to 0 are kept as in Lecture 3, and then replicate the following figure with the following requirements: • using the filled square shape with size .1 • using the scico::vik color palette • using the wrap_plot() function or the pip operator “+” to arrange the columns 15°N 20°N 25°N 30°N 35°N 40°N 45°N 50°N 80°W 75°W 70°W 65°W 60°W 55°W 3 5 8 ws 15°N 20°N 25°N 30°N 35°N 40°N 45°N 50°N 80°W 75°W 70°W 65°W 60°W 55°W 700 850 1000 press 15°N 20°N 25°N 30°N 35°N 40°N 45°N 50°N 80°W 75°W 70°W 65°W 60°W 55°W 276 288 301 temp c. (5pts) In the df data frame, pivot the variables ws, press, temp into longer format and give it a new name variable with their values stored in the new variable value. Then save this new dataset into a tibble p and print out the first 6 observations in this new data frame. You should obtain the following output ## Simple feature collection with 6 features and 7 fields ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: -63.0746 ymin: 50.24621 xmax: -60.31003 ymax: 50.88714 ## Geodetic CRS: WGS 84 ## # A tibble: 6 x 8 ## wd time lza sza DQF geometry variable value ## ## 1 209. 656121674. 60.0 77.1 0 (-60.31003 50.88714) ws 29.6 ## 2 209. 656121674. 60.0 77.1 0 (-60.31003 50.88714) press 746. ## 3 209. 656121674. 60.0 77.1 0 (-60.31003 50.88714) temp 280. ## 4 263. 656121674. 58.7 78.3 0 (-63.0746 50.24621) ws 3.42 10 ## 5 263. 656121674. 58.7 78.3 0 (-63.0746 50.24621) press 989. ## 6 263. 656121674. 58.7 78.3 0 (-63.0746 50.24621) temp 278. d. (8pts) Replicate the exact figure with the following requirements: • using the filled square shape with size .1 • using the scico::vik color palette 15°N 20°N 25°N 30°N 35°N 40°N 45°N 50°N press 80°W75°W70°W65°W60°W55°W 800 900 15°N 20°N 25°N 30°N 35°N 40°N 45°N 50°N temp 80°W75°W70°W65°W60°W55°W 280 290 300 15°N 20°N 25°N 30°N 35°N 40°N 45°N 50°N ws 80°W75°W70°W65°W60°W55°W 10 20 11