STA 4164-0001 Project Assignment #1 solution




Rate this product

1. Load dataset “Grad_Admission.csv” into R. Variables description are as follows:
ID: Unique identification code for each student (DO NOT INCLUDE)
GRE: GRE Scores (out of 340)
TOEFL: TOEFL Scores (out of 120)
Urate: University Rating (out of 5)
SOP: Statement of Purpose Strength (out of 5)
LOR: Letter of Recommendation Strength (out of 5)
CGPA: Undergraduate GPA (out of 4)
Chance:Chance of Admission (ranging from 0 to 100)
2. Print the dataset. (Limit your print to include only the first ten observations)
3. Print a table of the min/1st quartile/median/mean/3rd quartile/max of three of the variables (your
choice). 4. Make histograms for “Chance”, “CGPA”, “GRE” and “TOEFL” variables and comment on their
distribution. Are these variables normally distributed? What else can we find from histograms?
In this part, we want to explore the “Grad_Admission” dataset by applying what we learned about
regression analysis. The explanation is more critical, so make sure to explain your result for any of thefollowing tasks:
5. Consider “Chance” as the response variable and print scatter plots of each of the other six variables
against it. 6. Draw your initial conclusions about the relationship between independent variables and the
response variable based on the scatterplot. Is there any relationship? Is it linear enough?
7. Confirm the validity of five major linear regression assumptions and comment on them. 8. Choose the best three independent variables based on your immediate insight into the relationshipand list them. Write down the model. 9. Build up a table, including the correlation between all independent variables and the response
variable. What can you deduce from these correlation coefficients? Briefly explain the relation
between correlation coefficients and the slope of the regression equation. 10. Report the result of hypothesis testing for the correlation coefficient of each independent variable(i.e. ρ� = 0). 11. Build straight-line (univariate) regression models for all six independent variables. Test the null
hypothesis that each of the β�′s is equal to zero. (i=1, 2, …,6) (set α = 0.05)
12. Report the ANOVA table for two variables with the highest R-square value? What conclusion is
achievable looking at these tables? (hint: compare R-squares between variables and make suitableconclusion)
13. Build up a model, including all six variables available in the dataset. Are all coefficients significant at
� = 0.05?
14. Remove all non-significant variables from the model and rebuild the model. What has been
changed considerably in the ANOVA table compared to the model in task 13?
15. Build up confidence bands and prediction bands for all records. Print the appropriate table intoyour
output. 16. Write the appropriate equation to predict the admission chance with variables included in your
final model from task 11. Explain the meaning of intercept and slope in this equation. 17. What conclusion can you arrive at from this exploration in terms of the suitability of descriptivestatistics and regression in data exploration? What is the recommendation that you would providefuture data explorations to include as a result necessarily?