Solved Math 2310 Lab 4 Correlation & Regression

$30.00

Original Work ?

Download Details:

  • Name: Lab4-a0ftmg.zip
  • Type: zip
  • Size: 1.05 MB

Category: Tags: , , You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

Rate this product

In this lab assignment we will be using R to draw scatterplots, calculate correlations,
and fit regression lines to data.
A group of researchers from Victoria, Australia studied how the chemical
composition of cheddar cheese influenced how people rated the taste of the cheese.
Three chemical compounds were measured – hydrogen sulfide, lactic acid, and
acetic acid. We will refer to these three variables as the predictor variables, and we
will refer to the taste score as the response variable. The data is stored in the file
cheese.xls
When analyzing data with two or more numerical variables, we will often start by
examining the relationships between the variables using scatterplots and
correlations. Then, we will frequently be interested in creating a model that can
make predictions for one of the variables based on one or more of the other
variables. We can also assess the quality of the regression model through the
coefficient of determination, r2.
Goals for this assignment:
• Create scatterplots in R
• Calculate correlations in R
• Interpret scatterplots and correlations
• Use R to find the equation of a regression line
• Use a regression equation to make a prediction
• Interpret the r2 value to assess the quality of a regression model
• Use R to find a regression equation using more than one predictor variable at
once
• Use a regression model with transformed data to address a non-linear
relationship
The skill objectives and analysis objectives in each activity specify what you
should be submitting in your lab report.
Grading: there are two possible points for each skill objective and for each analysis
objective.
Activity 1
Examine the relationships between each of the three predictor variables and the
response variable.
Skill Objective: Using R, construct three scatterplots showing the relationship
between the response variable and each of the three predictor variables. For
each scatterplot, also find the corresponding correlation value, r.
Analysis Objective: Based on your scatterplots, comment briefly on whether
there appears to be a relationship between each of the predictor variables and
the response variable, and whether those relationships appear to be positive
or negative. Which of the three predictor variables has the strongest
relationship with the response variable?
Hints:
• R has a built in function cor(x, y) that will calculate the correlation between
variables x and y
• You can read in the excel file by putting library(readxl) at the beginning of
your file, and then doing something like:
data <- read_excel(“path/on/your/computer/to/cheese.xls”)
Activity 2
Now we will look at how we could use information about hydrogen sulfide levels to
predict taste ratings.
Skill Objective: Find the equation of the regression line predicting taste score
based on hydrogen sulfide level.
Analysis Objective: What taste score would you predict for a cheese whose
hydrogen sulfide measurement was 5.0?
Hints:
• You can calculate a linear regression model using specific columns of a
dataframe (assume it’s called “data”) with the following:
reg <- lm(formula = y_variable_name ~ x_variable_name, data = data)
• Once you have the regression, you can extract the coefficients using:
slope = reg$coefficients[[2]]
intercept = reg$coefficients[[1]]
Activity 3
Now we will assess the quality of this regression model
Skill Objective: Using R, find the value of the coefficient of determination, r2,
for the regression model predicting taste rating based on hydrogen sulfide
levels.
Analysis Objective: In one sentence, explain what this r2 value tells you about
the quality of the regression model.
Hint: You can calculate r2 manually, or you can extract it from a regression object
via summary(reg)$r.squared
Activity 4
Occasionally in lab we will look at ideas that expand somewhat beyond what is
covered in our book. This activity is one such time. In class, and in our book, we have
seen how to construct a regression equation using one variable to predict another.
But the same basic idea can be used to construct a regression equation using
multiple variables to predict a variable. So, for example, we might have an equation
of the form Y = a + bX + cW + dZ, where X, W, and Z are all variables we want to use
to make our prediction.
While we will not be learning the mathematics to estimate such an equation by
hand, it is quite simple to estimate an equation like this using R or other statistical
software.
Skill Objective: Using R, Estimate the equation of the regression line predicting
taste score based on all three predictor variables in a single equation.
Analysis Objective: Based on your regression equation, what taste score would
you predict for a cheese whose hydrogen sulfide measurement was 5.0, whose
acetic acid measurement was 6.1, and whose lactic acid measurement was
0.90? Also, based on the output from R, what proportion of the variability in
taste scores can be explained by this model using all three predictor
variables?
Hint:
• You can carry out a regression using multiple independent variables by
doing something like: reg <- lm(formula = y_variable_name ~ x_variable_1 +
x_variable_2 +… , data = data)
• Look into using the R function “predict” to make a prediction using your
regression model
Activity 5
(adapted from problem 28 in section 3.4 of the textbook)
Polyester fiber ropes are increasingly being used as components of mooring lines
for offshore structures in deep water. The authors of the paper “Quantifying the
Residual Creep Life of Polyester Mooring Ropes” (Intl. J. of Offshore and Polar Explor.,
2005: 223-228) used the data contained in the file “ropes.txt”as a basis for studying
how time to failure (hr) depended on load (% of breaking load).
In the data file, the first column is load (as a percentage out of 100), and the second
column is the time to failure.
Hint:
• You can read in the data using:
df <- read.delim(“ropes.txt”, sep = ” “, header=FALSE)
a) We will examine the relationship between the two variables.
Skill Objective: Construct a scatterplot of x = load versus y = time.
Analysis Objective: Would it be reasonable to characterize the relationship
between the two variables to be linear?
b) We will try to address the nonlinearity through a transformation.
Skill Objective: Transform the response variable by computing y’ = log(y).
Construct a scatterplot of x and y’.
Analysis Objective: Would it be reasonable to characterize the relationship
between these two variables to be linear?
c) Finally, we will fit a regression model.
Skill Objective: Fit a straight line to the (x, y’) data.
Analysis Objective: Based on the linear fit, predict the value of failure time
from a load of 85%.