CS 555 Assignments 1 to 6 solutions

$150.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

MET CS 555 Assignment 1

The data in the table below give the duration in days of hospital stays of patients admitted to the hospital with C. DifficileUse the data on the following page to:

(1) Save the data to a excel or CSV file and read into R for analysis. (2 points)

(2) Make a histogram of the duration of days of hospital stays.  Ensure the histogram is labelled appropriately.  Use a width of 1 day.  Describe the shape center and spread of the data.  Are there any outliers? (7 points)

(3) Find the mean, median, standard deviation, first and third quartiles, minimum and maximum of the durations of hospital stay in the sample.  Summarize these values in a table that you create in EXCEL or WORD. In other words, do *not* simply copy and paste R output. Given the shape of the distribution, what is the best single number summary of the center of the distribution?  What is the best single number summary of the spread of the distribution? (6 points)

(4) Assume that the literature on this topic suggests that the distribution of days of hospital stay are normally distributed with a mean of 5 and a standard deviation of 3.  Use R to determine the probabilities below based on the normal distribution:

(a) What percentage of patients are in the hospital for less than a week? (2 points)

(b) Recent publications have indicated that hypervirulent strains of C. Difficile are on the rise.  Such strains are associated with poor outcomes, including extended hospital stays.   An investigator is interested in showing that the average hospital stay durations have increased versus published literature.  He has a sample of 10 patients from his hospital.  If the published data are consistent with the truth, what is the probability that the sample mean in his sample will be greater than 7 days? (3 points)

 

 

Data is on the next page

 

 

 

7 3 5 3 1 5 10 3 4 4
7 5 8 3 4 1 15 4 5 8
5 3 2 3 5 9 4 5 6 9
5 3 6 3 2 6 4 5 5 4
5 8 4 6 14 4 6 3 2 3
2 4 6 6 6 8 6 3 4 4
5 10 4 6 3 9 3 9 4 7
10 13 4 6 5 10 4 4 9 4
4 3 6 8 5 7 6 1 3 12
15 5 2 1 4 4 5 6 4 12

 

MET CS 555 Assignment 2

An experiment was conducted to determine the effect of children participating in a given meal preparation on calorie intake for that meal.   Data are recorded below.  Save the data to a format that can be read into R.  Read the data in for analysis.  Use R to calculate the quantities and generate the visual summaries requested below. 

(1) Summarize the data by whether children participated in the meal preparation or not.  Use an appropriately labelled table to show the results.  Also include a graphical presentation that shows the distribution of calories for participants vs. non-participants.  Describe the shape of each distribution and comment on the similarity (or lack thereof) between the distributions in each population.

(2) Does the mean calorie consumption for those who participated in the meal preparation differ from 425?  Formally test at the level using the 5 steps outlined in the module.

(3) Calculate a 90% confidence interval for the mean calorie intake for participants in the meal preparation.  Interpret the confidence interval.

(4) Formally test whether or not participants consumed more calories than non-participants at the level using the 5 steps outlined in the module.

(5) Are the assumptions of the test used in (4) met?  How do you know?

 

Calorie Intake for participants

  435.16
338.99
488.73
590.28
582.59
635.21
249.86
441.66
572.43
357.78
396.79
298.38
282.99
368.51
388.59
256.32
408.82
424.94
477.96
428.74
432.52
428.27
596.79
  456.30
446.38

 

Calorie intake for non-participants

414.61
503.46
425.22
288.77
184.00
299.73
350.65
394.94
261.55
295.28
139.69
462.78
179.59
301.75
436.58
371.39
469.02
378.09
287.31
448.55
332.64
403.98

MET CS 555 Assignment 3

The data in this document gives the number of meals eaten that contain fish (per week) and mercury levels in head hair for 100 fisherman.  Save the data to a format that can be read into R.  Read the data in for analysis.  Use R to calculate the quantities and generate the visual summaries requested below. 

(1) Save the data to a file  (excel or CSV file) and read it into R memory for analysis. (Q1 –  2 points)

(2) To get a sense of the data, generate a scatterplot (using an appropriate window, label the axes, and title the graph).  Consciously decide which variable should be on the -axis and which should be on the y-axis.  Using the scatterplot, describe the form, direction, and strength of the association between the variables. (Q2 – 3 points)

(3) Calculate the correlation coefficient.  What does the correlation tell us? (Q3 – 2 points)

(4) Find the equation of the least squares regression equation, and write out the equation.  Add the regression line to the scatterplot you generated above. (Q4 – 4 points)

(5) What is the estimate for  beta_1 ?   How can we interpret this value?  What is the estimate for beta_0 ?  What is the interpretation of this value?  (Q5 – 4 points)

(6) Calculate the ANOVA table and the table which gives the standard error of   (hat beta 1) .  Formally test the hypothesis that beta_1 = 0 using either the F-test or the t-test at the  alpha level a=0.10.  Either way, present your results using the 5 step procedure as in the course notes.  Within your conclusion, calculate the R2    (R  squared) value and interpret this.

Also, calculate and interpret the 90% confidence interval for  beta_1 .   (Q6 –  5 points)

 

 

 

 

 

 

Number of meals with fish Total Mercury in mg/g
14 4.484
7 4.789
5 3.856
8 4.888
21 10.849
18 6.457
22 11.222
6 4.908
19 10.116
7 3.567
16 6.092
17 3.799
20 6.781
5 5.995
7 1.717
14 4.615
1 3.362
6 3.928
9 1.833
10 5.668
13 4.7
9 2.272
16 4.812
5 1.342
18 6.123
7 4.622
8 7.805
7 2.643
8 6.111
7 2.476
10 4.317
4 1.789
4 2.484
7 1.757
6 1.239
5 5.311
19 6.103
3 1.984
4 2.697
7 0.692
7 2.404
9 1.503
17 8.231
14 5.321
7 3.81
21 1.765
4 0.408
7 3.901
10 0.48
11 3.826
7 3.451
9 2.32
2 4.086
7 2.272
3 2.564
7 7.998
11 5.081
8 0.366
7 2.477
4 5.288
7 5.676
7 2.296
21 6.11
4 1.502
7 3.71
3 2.752
3 0.987
19 10.14
7 1.616
12 4.65
13 7.241
18 9.36
7 3.753
13 4.008
21 5.345
1 2.455
0 0.941
1 2.478
1 3.212
10 5.214
0 1.12
0 0.745
2 4.645
2 4.981
1 2.812
0 0.846
2 5.142
0 1.111
0 1.094
2 2.978
2 3.942
0 1.131
0 0.979
0 0.844
1 2.411
1 2.497
10 3.764
20 8.178
19 7.664
22 9.716

MET CS 555 Assignment 4

The data on the next two pages is from a Canadian 1970 census which collected information about specific occupations.  Data collected was used to develop a regression model to predict prestige for all occupations.  Use R to calculate the quantities and generate the visual summaries requested below. 

(1) Save the data to excel or CSV file and read into R for analysis.  (1 point)

(2) To get a sense of the data, generate a scatterplot to examine the association between prestige score and years of education.  Briefly describe the form, direction, and strength of the association between the variables.  Calculate the correlation.  (3 points)

(3) Perform a simple linear regression.  Generate a residual plot.  Assess whether the model assumptions are met.  Are there any outliers or influence points?  If so, identify them by ID and comment on the effect of each on the regression. (4 points)

(4) Calculate the least squares regression equation that predicts prestige from education, income and percentage of women.  Formally test whether the set of these predictors are associated with prestige at the  = 0.05 level.  (4 points)

(5) If the overall model was significant, summarize the information about the contribution of each variable separately at the same significance level as used for the overall model (no need to do a formal 5-step procedure for each one, just comment on the results of the tests).  Provide interpretations for any estimates that were significant.   Calculate 95% confidence intervals where appropriate. (4 points)

(6) Generate a residual plot showing the fitted values from the regression against the residuals.  Is the fit of the model reasonable? (2 points)

(7) Are there any outliers or influence points?  (2 points)

 

 

 

 

 

 

Occupational Title Education Level (years) Income ($) Percent of Workforce that are Women Prestige Score
GOV_ADMINISTRATORS 13.11 12351 11.16 68.8
GENERAL_MANAGERS 12.26 25879 4.02 69.1
ACCOUNTANTS 12.77 9271 15.7 63.4
PURCHASING_OFFICERS 11.42 8865 9.11 56.8
CHEMISTS 14.62 8403 11.68 73.5
PHYSICISTS 15.64 11030 5.13 77.6
BIOLOGISTS 15.09 8258 25.65 72.6
ARCHITECTS 15.44 14163 2.69 78.1
CIVIL_ENGINEERS 14.52 11377 1.03 73.1
MINING_ENGINEERS 14.64 11023 0.94 68.8
SURVEYORS 12.39 5902 1.91 62
DRAUGHTSMEN 12.3 7059 7.83 60
COMPUTER_PROGRAMERS 13.83 8425 15.33 53.8
ECONOMISTS 14.44 8049 57.31 62.2
PSYCHOLOGISTS 14.36 7405 48.28 74.9
SOCIAL_WORKERS 14.21 6336 54.77 55.1
LAWYERS 15.77 19263 5.13 82.3
LIBRARIANS 14.15 6112 77.1 58.1
VOCATIONAL_COUNSELLORS 15.22 9593 34.89 58.3
MINISTERS 14.5 4686 4.14 72.8
UNIVERSITY_TEACHERS 15.97 12480 19.59 84.6
PRIMARY_SCHOOL_TEACHERS 13.62 5648 83.78 59.6
SECONDARY_SCHOOL_TEACHERS 15.08 8034 46.8 66.1
PHYSICIANS 15.96 25308 10.56 87.2
VETERINARIANS 15.94 14558 4.32 66.7
OSTEOPATHS_CHIROPRACTORS 14.71 17498 6.91 68.4
NURSES 12.46 4614 96.12 64.7
NURSING_AIDES 9.45 3485 76.14 34.9
PHYSIO_THERAPSTS 13.62 5092 82.66 72.1
PHARMACISTS 15.21 10432 24.71 69.3
MEDICAL_TECHNICIANS 12.79 5180 76.04 67.5
COMMERCIAL_ARTISTS 11.09 6197 21.03 57.2
RADIO_TV_ANNOUNCERS 12.71 7562 11.15 57.6
ATHLETES 11.44 8206 8.13 54.1
SECRETARIES 11.59 4036 97.51 46
TYPISTS 11.49 3148 95.97 41.9
BOOKKEEPERS 11.32 4348 68.24 49.4
TELLERS_CASHIERS 10.64 2448 91.76 42.3
COMPUTER_OPERATORS 11.36 4330 75.92 47.7
SHIPPING_CLERKS 9.17 4761 11.37 30.9
FILE_CLERKS 12.09 3016 83.19 32.7
RECEPTIONSTS 11.04 2901 92.86 38.7
MAIL_CARRIERS 9.22 5511 7.62 36.1
POSTAL_CLERKS 10.07 3739 52.27 37.2
TELEPHONE_OPERATORS 10.51 3161 96.14 38.1
COLLECTORS 11.2 4741 47.06 29.4
CLAIM_ADJUSTORS 11.13 5052 56.1 51.1
TRAVEL_CLERKS 11.43 6259 39.17 35.7
OFFICE_CLERKS 11 4075 63.23 35.6
SALES_SUPERVISORS 9.84 7482 17.04 41.5
COMMERCIAL_TRAVELLERS 11.13 8780 3.16 40.2
SALES_CLERKS 10.05 2594 67.82 26.5
NEWSBOYS 9.62 918 7 14.8
SERVICE_STATION_ATTENDANT 9.93 2370 3.69 23.3
INSURANCE__AGENTS 11.6 8131 13.09 47.3
REAL_ESTATE_SALESMEN 11.09 6992 24.44 47.1
BUYERS 11.03 7956 23.88 51.1
FIREFIGHTERS 9.47 8895 0 43.5
POLICEMEN 10.93 8891 1.65 51.6
COOKS 7.74 3116 52 29.7
BARTENDERS 8.5 3930 15.51 20.2
FUNERAL_DIRECTORS 10.57 7869 6.01 54.9
BABYSITTERS 9.46 611 96.53 25.9
LAUNDERERS 7.33 3000 69.31 20.8
JANITORS 7.11 3472 33.57 17.3
ELEVATOR_OPERATORS 7.58 3582 30.08 20.1
FARMERS 6.84 3643 3.6 44.1
FARM_WORKERS 8.6 1656 27.75 21.5
ROTARY_WELL_DRILLERS 8.88 6860 0 35.3
BAKERS 7.54 4199 33.3 38.9
SLAUGHTERERS_1 7.64 5134 17.26 25.2
SLAUGHTERERS_2 7.64 5134 17.26 34.8
CANNERS 7.42 1890 72.24 23.2
TEXTILE_WEAVERS 6.69 4443 31.36 33.3
TEXTILE_LABOURERS 6.74 3485 39.48 28.8
TOOL_DIE_MAKERS 10.09 8043 1.5 42.5
MACHINISTS 8.81 6686 4.28 44.2
SHEET_METAL_WORKERS 8.4 6565 2.3 35.9
WELDERS 7.92 6477 5.17 41.8
AUTO_WORKERS 8.43 5811 13.62 35.9
AIRCRAFT_WORKERS 8.78 6573 5.78 43.7
ELECTRONIC_WORKERS 8.76 3942 74.54 50.8
RADIO_TV_REPAIRMEN 10.29 5449 2.92 37.2
SEWING_MACH_OPERATORS 6.38 2847 90.67 28.2
AUTO_REPAIRMEN 8.1 5795 0.81 38.1
AIRCRAFT_REPAIRMEN 10.1 7716 0.78 50.3
RAILWAY_SECTIONMEN 6.67 4696 0 27.3
ELECTRICAL_LINEMEN 9.05 8316 1.34 40.9
ELECTRICIANS 9.93 7147 0.99 50.2
CONSTRUCTION_FOREMEN 8.24 8880 0.65 51.1
CARPENTERS 6.92 5299 0.56 38.9
MASONS 6.6 5959 0.52 36.2
HOUSE_PAINTERS 7.81 4549 2.46 29.9
PLUMBERS 8.33 6928 0.61 42.9
CONSTRUCTION_LABOURERS 7.52 3910 1.09 26.5
PILOTS 12.27 14032 0.58 66.1
TRAIN_ENGINEERS 8.49 8845 0 48.9
BUS_DRIVERS 7.58 5562 9.47 35.9
TAXI_DRIVERS 7.93 4224 3.59 25.1
LONGSHOREMEN 8.37 4753 0 26.1
TYPESETTERS 10 6462 13.58 42.2
BOOKBINDERS 8.55 3617 70.87 35.2

MET CS 555 Assignment 5

The data in this document is from 3 groups of students (math, chemistry, and physics) on an IQ related test.  Save the data to CSV/Excel file and read the data into R.  Use this data to address the following questions:

  • How many students are in each group? Summarize the data relating to both test score and age by the student group (separately).  Use appropriate numerical and/or graphical summaries.  (3 points)

 

  • Do the test scores vary by student group? Perform a one way ANOVA using the aov or Anova function in R to assess.  Summarize the results using the 5 step procedure.  If the results of the overall model are significant, perform the appropriate pairwise comparisons using Tukey’s procedure to adjust for multiple comparisons and summarize these results.  (7 points)

 

  • Create an appropriate number of dummy variables for student group and re-run the one-way ANOVA using the lm function with the newly created dummy variables. Set chemistry students as the reference group.  Confirm if the results are the same.  What is the interpretation of the beta estimates from the regression model?  (4 points)

 

  • Re-do the one-way ANOVA adjusting for age. Focus on the output relating to the comparisons of test score by student type.  Explain how this analysis differs from the analysis in step 2 above (not the results but how does this analysis differ in terms of the questions it answers as opposed to the one above).  Did you obtain different results?  Summarize briefly (no need to go through the 5 –step procedure here).   Present the least square means and interpret these. (6 points)

 

group iq age
Physics student 34 15
Physics student 33 17
Physics student 32 15
Physics student 25 14
Physics student 36 19
Physics student 30 18
Physics student 31 16
Physics student 34 17
Physics student 29 16
Physics student 34 17
Physics student 39 16
Physics student 33 18
Physics student 39 19
Physics student 42 20
Physics student 41 20
Math student 36 20
Math student 38 28
Math student 37 22
Math student 35 18
Math student 41 19
Math student 40 23
Math student 36 19
Math student 38 16
Math student 24 18
Math student 39 20
Math student 29 19
Math student 38 20
Math student 45 23
Math student 44 24
Math student 44 22
Chemistry student 52 46
Chemistry student 46 38
Chemistry student 51 41
Chemistry student 52 39
Chemistry student 45 44
Chemistry student 49 33
Chemistry student 47 41
Chemistry student 46 36
Chemistry student 41 40
Chemistry student 47 44
Chemistry student 46 46
Chemistry student 42 38
Chemistry student 43 32
Chemistry student 47 41
Chemistry student 40 42

MET CS 555 Assignment 6

The data in this document consists of body temperature measurements and heart rate measurements for 65 men and 65 women.  Save the data to excel and read the data into R.  Use this data to address the following questions.

(1) We are interested in whether the proportion of men and women with body temperatures greater than or equal to 98.6 degrees Fahrenheit are equal. Therefore, we need to dichotomize the body temperature variable. Create a new variable, called “temp_level” in which temp_level = 1 if body temperature >= 98.6 and temp_level=0 if body temperature < 98.6. (1 point)

(2) Summarize the data relating to body temperature level by sex. (2 points)

(3) Calculate the risk difference.  Formally test (at the α=.05 level) whether the proportion of people with higher body temperatures (greater than or equal to 98.6) is the same across men and women, based on this effect measure.  Do females have higher body temperatures than males? (4.5 points)

(4) Perform a logistic regression with sex as the only explanatory variable.  Formally test (at the α=.05 level) if the odds of having a temperature greater than or equal to 98.6 is the same between males and females.   Include the odds ratio for sex and the associated 95% confidence interval based on the model in your summary and interpret this value.  What is the c-statistic for this model? (5.5 points)

(5) Perform a multiple logistic regression predicting body temperature level from sex and heart rate.  Summarize briefly the output from this model.  Give the odds ratio for sex and heart rate (for a 10 beat increase).  What is the c-statistic of this model?  (5 points)

(6) Which model fit the data better?  Support your response with evidence from your output.  Present the ROC curve for the model you choose. (2 points)

 

 

 

 

 

 

 

 

Data (1=males, 2 =females)

temp sex Heart rate
96.3 1 70
96.7 1 71
96.9 1 74
97 1 80
97.1 1 73
97.1 1 75
97.1 1 82
97.2 1 64
97.3 1 69
97.4 1 70
97.4 1 68
97.4 1 72
97.4 1 78
97.5 1 70
97.5 1 75
97.6 1 74
97.6 1 69
97.6 1 73
97.7 1 77
97.8 1 58
97.8 1 73
97.8 1 65
97.8 1 74
97.9 1 76
97.9 1 72
98 1 78
98 1 71
98 1 74
98 1 67
98 1 64
98 1 78
98.1 1 73
98.1 1 67
98.2 1 66
98.2 1 64
98.2 1 71
98.2 1 72
98.3 1 86
98.3 1 72
98.4 1 68
98.4 1 70
98.4 1 82
98.4 1 84
98.5 1 68
98.5 1 71
98.6 2 77
98.6 1 78
98.6 1 83
98.6 2 66
98.6 1 70
98.6 1 82
98.7 2 73
98.7 1 78
98.8 1 78
98.8 1 81
98.8 2 78
98.9 1 80
99 2 75
99 2 79
99 1 81
99.1 1 71
99.2 1 83
99.3 1 63
99.4 1 70
99.5 1 75
96.4 2 69
96.7 2 62
96.8 1 75
97.2 1 66
97.2 2 68
97.4 2 57
97.6 1 61
97.7 2 84
97.7 1 61
97.8 2 77
97.8 2 62
97.8 2 71
97.9 1 68
97.9 2 69
97.9 2 79
98 2 76
98 1 87
98 2 78
98 2 73
98 2 89
98.1 2 81
98.2 2 73
98.2 2 64
98.2 2 65
98.2 2 73
98.2 2 69
98.2 2 57
98.3 2 79
98.3 2 78
98.3 2 80
98.4 2 79
98.4 2 81
98.4 2 73
98.4 2 74
98.4 2 84
98.5 2 83
98.6 2 82
98.6 2 85
98.6 2 86
98.6 2 77
98.7 2 72
98.7 2 79
98.7 2 59
98.7 2 64
98.7 2 65
98.7 2 82
98.8 2 64
98.8 2 70
98.8 2 83
98.8 2 89
98.8 2 69
98.8 2 73
98.8 2 84
98.9 2 76
99 2 79
99 2 81
99.1 2 80
99.1 2 74
99.2 2 77
99.2 2 66
99.3 2 68
99.4 2 77
99.9 2 79
100 2 78
100.8 2 77