## Description

Problem I:

This problem is based on the data file and context of an exercise question in Chapter 3.

The data file airfares.txt on the book website gives the one-way airfare (in US dollars) and

distance (in miles) from city A to 17 other cities in the US. Interest centers on modeling airfare

as a function of distance:

Fare = β Distance e 0 + β1 +

(a). Fit a simple linear regression model and interpret the estimated intercept in the context of

the question.

(b). Draw two residual plots. The first one has residual as the vertical axis and the predictor

variable (Distance) as the horizontal axis. The second one has residual as the vertical axis and

fitted value ( Fare ) as the horizontal axis. Compare and contrast these two residual plots. What

ˆ

is similar? What is different?

(c). Which observation has the largest leverage? Is this a leverage point according to the rule on

P56 of the textbook? Hint: Use R function hatvalues(YourModel) to obtain the leverages.

(d). Which observation has the largest standardized residual? Is this an outlier according to the

rule on P60 of the textbook? Hint: Use R function rstandard(YourModel) to obtain the

standardized residuals.

(e). Which observation has the largest Cook’s Distance? Using the rough cutoff suggested by

Fox (See P68 of textbook), is the largest value of the Cook’s Distance noteworthy?

Hint: Use R function cooks.distance(YourModel) to obtain the Cook’s Distance.

Problem II:

This problem focuses on calculating the quantities (leverages, standardized residual, Cook’s

Distance) by hand. You may use R as a calculator.

Consider a simple linear regression model with one continuous predictor variable. We have

sample size n = 100, RSS = 24.5, SXX = 400, Xˉ = 20 . The following table includes the statistics

for two of the observations.

xi yi yˆ

i

Observation 11 30.5 5 5.3

Observation 69 26.5 4.5 4.3

(a). Which observation has a larger leverage ( h )? Don’t do any calculation yet!

ii

(b). Calculate the leverages ( hii

) for the two observations. Are the results in (b) consistent with

your answer in (a)?

(c). Calculate the residuals ( eˆ ) for the two observations.

i

(d). Calculate the standardized residuals (r ) for the two observations.

i

(e). Calculate the Cook’s Distance ( D ) for the two observations.

i

Problem III:

Download the dataset brainsPartial.csv from Moodle. The data gives the average body weight in

kilograms (X) and the average brain weight in grams (Y) for 59 species of mammals. We want to

build a model to predict a mammal’s brain weight based on the body weight.

(a). Draw a scatter plot to visualize the relationship between the average body weight and

average brain weight of the 59 mammals in the dataset. Is the linearity assumption violated?

(b). Fit a regression model and print out the summary output of the refitted model. Can you

remove the slope from the model?

(c). Use the normal QQ plot to determine if the normality assumption has been violated.

(d). Use the standardized residual plot to determine if the constant variance assumption has

been violated. Hint: The vertical axis is the standardized residuals. The horizontal axis is the

fitted value.

(e). Refit the model using the transformed response variable ( log(Y) ) and predictor variable (

log(X) ). “Log” represents natural logarithm. Comment on whether the model based on the

transformed data violates the assumptions for linear regression.