## Description

## 1. Comprehension Test

Children in a school class are given a test of comprehension of English, marked out

of 100. The children are from three different ethnic groups, which is thought to be

an important factor. The question of interest is whether there are sex differences

after allowing for ethnicity.

The data follow:

Females Males

Ethnic group E1 67 66 75 76 71 70 72 63 72 62 61 69 64 71 68 56

E2 69 57 55 63 65 55 59 47 49

E3 30 47 39 33

(a) A two-way ANOVA was run on the data, with SAS output given on pages 3

to 6. Present the results from the ANOVA following the usual Assignment

Guidelines, as given on page 1.

(b) If a one-way ANOVA is done with factor Sex, the resulting ANOVA table is:

Source DF Sum of Squares Mean Square F value p-value

Sex 1 144.166 144.166 0.99 0.3292

Error 27 3942.662 146.025

Total 28 4086.828

Briefly discuss the outcomes of the separate tests for Sex presented in parts (a)

and (b). Are the conclusions different? Give reasons to explain your answer.

SAS Output for Comprehension Test

Linear Models

The GLM Procedure

Class Level Information

Class LevelsValues

Ethnicity 3E1 E2 E3

Sex 2F M

Number of Observations Read29

Number of Observations Used29

Dependent Variable: Comprehension

Source DFSum of SquaresMean SquareF Value Pr > F

Model 5 3365.438697 673.087739 21.46<.0001

Error 23 721.388889 31.364734

Corrected Total 28 4086.827586

R-Square Coeff Var Root MSE Comprehension Mean

0.8234849.275400 5.600423 60.37931

Source DF Type I SSMean SquareF Value Pr > F

Ethnicity 23060.6400861530.320043 48.79<.0001

Sex 1 275.113176 275.113176 8.770.0070

Ethnicity*Sex 2 29.685435 14.842718 0.470.6289

SAS Output for Comprehension Test

Linear Models

The GLM Procedure

SAS Output for Comprehension Test

Lines nearly parallel, no significant interaction.

Vertical separation shows sex differences (males lower than females).

Non-zero slope shows ethnicity differences.

SAS Output for Comprehension Test

Lines nearly parallel, no significant interaction.

Vertical separation shows ethnicity differences.

Non-zero slope shows sex differences (males lower than females).

Note: E1 is the top line, E2 the middle line and E3 the lowest. (The lines are different

colours, but that doesn’t show up if viewed or printed in black and white.)

## 2. Invertebrates in Mussel Clumps

The following data are from Peake and Quinn (1993), Temporal variation in speciesarea curves for invertebrates in clumps of an intertidal mussel, Ecography 16, 269-

277. The two variables used in this question are:

x = log10(Area) of each of 25 mussel clumps (in dm2

), and

Y = number of different species of macroinvertebrates in each clump.

Note: Using log(Area) gives a straighter regression line than Area, which is why it

is used. This is a transformation of x, not Y ; it has been done to improve linearity,

not to stabilise variances.

The data follow. Decide if there is a useful linear relationship between x and Y , i.e.

if x is a useful linear predictor of Y .

Clump logArea Species

1 2.71 3

2 2.67 7

3 2.66 6

4 2.97 8

5 3.13 10

6 3.25 9

7 3.23 10

8 3.25 11

9 3.49 16

10 3.60 9

11 3.65 13

12 3.65 14

13 3.70 12

14 3.65 14

15 3.74 20

16 3.87 22

17 3.85 15

18 3.96 20

19 4.01 22

20 3.97 21

21 4.14 15

22 4.31 24

23 4.39 25

24 4.43 25

25 4.42 24

(a) A scatterplot of the data is given on page 8. Give comments on whether you

think the plot shows (i) linearity, (ii) constant variance.

(b) Output from a simple linear regression using logArea to predict the number

of species is given on pages 9 and 10. Present a report on this analysis that

includes (as usual) the model equation, hypotheses, assumptions, comments on

whether the analysis is valid, plus statistical conclusions and interpretation.

SAS Output for Mussel Clumps

Scatter Plot

SAS Output for Mussel Clumps

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: Species

Number of Observations Read25

Number of Observations Used25

Analysis of Variance

Source DF

Sum of

Squares

Mean

SquareF Value Pr > F

Model 1 868.50179868.50179 117.85<.0001

Error 23 169.49821 7.36949

Corrected Total 241038.00000

Root MSE 2.71468R-Square0.8367

Dependent Mean15.00000Adj R-Sq0.8296

Coeff Var 18.09787

Parameter Estimates

Variable DF

Parameter

Estimate

Standard

Errort Value Pr > |t|

Intercept 1-25.64136 3.78287 -6.78<.0001

logArea 1 11.20214 1.03189 10.86<.0001

SAS Output for Mussel Clumps

Linear Regression Results

## 3. Coarse Woody Debris in Lakes

Christensen et al. (1996, Ecological Applications 6(4), 1143-1149) studied the relationships between coarse woody debris (CWD), shoreline vegetation and lake development in a sample of 16 lakes in North America. Coarse woody debris is useful

in providing a habitat for various fish species. It is known to be related to the

riparian (river-bank, lake-edge) tree density, irrespective of whether or not humans

are present. The objective is to find out whether, after allowing for riparian tree

density, human habitation is having an effect on the CWD.

The variables below were taken around the shoreline and near-shore water:

L10CABIN = log10 of 1 + density of cabins (number km−1

),

RIP.DENS = density of riparian trees (trees km−1

), and

CWD.BASA = basal area of coarse woody debris (m2 km−1

).

LAKE AREA RIP.DENS CWD.BASA L10CABIN

Bay 69 1270 121 0

Bergner 9 1210 41 0

Crampton 24 1800 183 0

Long 8 1875 130 0

Roach 20 1300 127 0

Tenderfoot 175 2150 134 0.20412

Palmer 254 1330 65 0.462398

Street 22 964 52 0.6627578

Laura 240 961 12 0.7075702

Annabelle 85 1400 46 0.763428

Joyce 12 1280 54 0.845098

Lake hills 25 976 97 0.8864907

Towanda 58 771 1 1.10721

Black oak 234 833 4 1.1238516

Johnson 31 883 1 1.2552725

Arrowhead 40 956 4 1.40824

(a) Let Y = CWD.BASA, X1 = RIP.DENS and X2 = L10CABIN. Plots of Y

vs. X1, Y vs. X2 and X1 vs. X2 are given on page 12. Comment on any

relationships you see.

(b) SAS output for the following models is presented on pages 13 to 16. Diagnostic

graphs are shown for the last model.

i. Regression of Y on the predictor X1

ii. Regression of Y on the predictor X2

iii. Regression of Y on the two predictors X1 and X2

For each analysis above, present the model equation, hypotheses and conclusions. For the third analysis, comment on whether or not the model assumptions are satisfied.

(c) Which of the hypothesis tests from the three presented models gives the answer

to the question of interest in this situation? Explain the answer.

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0 50 100 150

L10CABIN

CWD.BASA

800 1200 1600 2000

0 50 100 150

RIP.DENS

CWD.BASA

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

800 1200 1600 2000

L10CABIN

RIP.DENS

Scatterplots: CWD by L10CABIN, CWD by RIP.DENS, RIP.DENS by L10CABIN

Coarse Woody Debris SAS Output

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: CWD.BASA

Number of Observations Read 16

Number of Observations Used 16

Analysis of Variance

Source DF

Sum of

Squares

Mean

Square F Value Pr > F

Model 1 32054 32054 24.30 0.0002

Error 14 18466 1318.96866

Corrected Total 15 50520

Root MSE 36.31761 R-Square 0.6345

Dependent Mean 67.00000 Adj R-Sq 0.6084

Coeff Var 54.20539

Parameter Estimates

Variable DF

Parameter

Estimate

Standard

Error t Value Pr > |t|

Intercept 1 -77.09908 30.60801 -2.52 0.0246

RIP.DENS 1 0.11552 0.02343 4.93 0.0002

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: CWD.BASA

Number of Observations Read 16

Number of Observations Used 16

Analysis of Variance

Source DF

Sum of

Squares

Mean

Square F Value Pr > F

Model 1 32840 32840 26.00 0.0002

Error 14 17680 1262.86950

Corrected Total 15 50520

Root MSE 35.53688 R-Square 0.6500

Dependent Mean 67.00000 Adj R-Sq 0.6250

Coeff Var 53.04011

Parameter Estimates

Variable DF

Parameter

Estimate

Standard

Error t Value Pr > |t|

Intercept 1 121.96875 13.96871 8.73 <.0001

L10CABIN 1 -93.30142 18.29646 -5.10 0.0002

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: CWD.BASA

Number of Observations Read 16

Number of Observations Used 16

Analysis of Variance

Source DF

Sum of

Squares

Mean

Square F Value Pr > F

Model 2 38041 19020 19.81 0.0001

Error 13 12479 959.93185

Corrected Total 15 50520

Root MSE 30.98277 R-Square 0.7530

Dependent Mean 67.00000 Adj R-Sq 0.7150

Coeff Var 46.24294

Parameter Estimates

Variable DF

Parameter

Estimate

Standard

Error t Value Pr > |t|

Intercept 1 18.16485 46.22822 0.39 0.7007

RIP.DENS 1 0.06572 0.02823 2.33 0.0367

L10CABIN 1 -56.26481 22.53059 -2.50 0.0267

## 4. Age of Teeth

In forensic work, scientists estimate the age of a skeleton by counting teeth cementum annulation (i.e. growth rings). Two teeth preparation methods, A and B, are

compared by estimating the ages (Y ) of twenty teeth of known age (X). The teeth

are randomly allocated to the two methods, ten to each, as follows.

Method A X = true age 49 13 38 55 44 56 7 66 18 39

Y = estimated age 50 14 38 57 44 55 7 63 20 38

Method B X = true age 51 59 32 37 12 38 4 28 58 24

Y = estimated age 51 59 29 34 10 35 5 25 57 22

A confirmatory analysis using a model with terms True Age (i.e. X), Method and

True Age×Method is required.

(a) Give the model equation for the required confirmatory analysis.

(b) SAS output from a fitted model is given on pages 18 to 20. Present a report on this analysis that includes any necessary assumptions, comments on

their validity, hypotheses, statistical conclusions at a 5% significance level, and

interpretation plus discussion.

Linear Models

The GLM Procedure

Class Level Information

Class LevelsValues

Method 2A B

Number of Observations Read20

Number of Observations Used20

Dependent Variable: Y

Source DFSum of Squares Mean SquareF Value Pr > F

Model 3 6543.6646602181.221553 946.16<.0001

Error 16 36.885340 2.305334

Corrected Total 19 6580.550000

R-SquareCoeff VarRoot MSE Y Mean

0.9943954.258997 1.51833335.65000

Source DF Type I SS Mean Square F Value Pr > F

X 16525.5352066525.5352062830.62<.0001

Method 1 15.413619 15.413619 6.69 0.0199

X*Method 1 2.715836 2.715836 1.18 0.2938

Source DF Type III SS Mean Square F Value Pr > F

X 16350.0068376350.0068372754.48<.0001

Method 1 10.463729 10.463729 4.54 0.0490

X*Method 1 2.715836 2.715836 1.18 0.2938

Data and fitted lines: Method A line (dashed) is above Method B line (solid) Data and fitted lines:

A

A

A

A

A

A

A

A

A

A

B

B

B

B

B

B

B

B

B

B

10 20 30 40 50 60

0 10 20 30 40 50 60

True age

Estimated age