# STAT 292 Assignment 5 solution

\$25.00

Original Work ?

## Description

5/5 - (1 vote)

1. Table 1 presents a subset of data collected by V¨ais¨anen and J¨arvinen (1977) on bird
species in the Krunnit Islands archipelago of Finland. In particular, they reported
on the bird species found on each of the islands in 1949 and how many of those bird
species were extinct by 1970.

It is of interest to understand whether the area of
the island (in km2
) is associated with species’ survival. The data corresponding to
Table 1 are available in the Excel file Extinction.xlsx.
Extinct?
Island Area (X) Yes No
Ulkokrunni 185.80 5 70
Maakrunni 105.80 3 64
Ristikari 30.70 10 56
Isonkivenletto 8.50 6 45
Hietakraasukka 4.80 3 25
Kraasukka 4.50 4 16
L¨ansiletto 4.30 8 35
Table 1: Extinction of bird species from 1949 to 1970 on seven islands in the Krunnit
Islands archipelago, Finland.
Fit the logistic regression model
log 
p(X)
1 − p(X)

= β0 + β1X
where X denotes island area and p(X) denotes the probability of extinction.
Figure 1 shows relevant SAS output for the logistic regression model.

(a) Carry out an appropriate goodness-of-fit test to determine whether the model
provides a good fit to the data. State the hypotheses, and give the test statistic
and the p-value of the test. What do you conclude at the α = 0.05 significance
level?

(b) Give estimates of β0 and β1 (up to 5dp).

(c) Interpret the association between island area and extinction using the odds ratio. Demonstrate how the odds ratio is calculated from Figure 1. Additionally,
provide a 95% confidence interval for the odds ratio.
1
Figure 1: Summary output for the logistic regression model log 
p(X)
1−p(X)

= β0 + β1X.

(d) Find the predicted probability of extinction for an island with an area of 50
km2
(to 4dp).

(e) Find the fitted count of extinct bird species on the island of Ulkokrunni (to
2dp). Also find the fitted count of non-extinct bird species on Ulkokrunni (to
2dp).

(f) Test
H0 : β1 = 0
H1 : β1 6= 0
using the Wald statistic. Give the test statistic and the p-value of the test.
What do you conclude at the α = 0.05 significance level?

2. Consider data reported by Gilbert (1981) on the relationship between pre-marital
sex (i.e., sexual intercourse before marriage), extra-marital sex (i.e., sexual intercourse with someone other than a spouse whilst married), and whether the person
had been divorced for a random sample of heterosexual men and women who had
been married at least once. These data are presented in Table 2 and are available
in the Excel file Divorce.xlsx.

Gender Pre-marital Extra-marital Divorced? (Z)
(W) Sex (X) Sex (Y ) No Yes
Woman
Yes Yes 4 17
No 25 54
No Yes 4 36
No 322 214
Man
Yes Yes 11 28
No 42 60
No Yes 4 17
No 130 68
Table 2: Data on reported pre-marital sex, extra-marital sex, and divorce for a random
sample of heterosexual men and women.

First, use the backward model selection method to find the simplest model that
provides a good fit to the data. Start from the following model, which we will
denote by M2,
log 
pijk
1 − pijk 
= β0 + β
W
i + β
X
j + β
Y
k + β
W X
ij + β
W Y
ik + β
XY
jk + β
W XY
ijk ,
where pijk is the probability of divorce when the gender (W) is at level i, pre-marital
sex status (X) is at level j, and extra-marital sex status (Y ) is at level k.
Figure 2 shows relevant summary output from SAS.

(a) Is model M2 a saturated model? Why or why not?

(b) What information does Step 1 provide in the SAS output? Write down the
test hypotheses. What do you conclude?

Figure 2: Summary output for the backward selection method applied to the logit model
log 
pijk
1−pijk 
= β0 + β
W
i + β
X
j + β
Y
k + β
W X
ij + β
W Y
ik + β
XY
jk + β
W XY
ijk .

(c) What is the final model?
Now consider the logit model, which we will denote by M1,
log 
pijk
1 − pijk 
= β0 + β
W
i + β
X
j + β
Y
k + β
XY
jk .
which uses a reference level parametrisation for all factors.
Figure 3 shows relevant summary output from SAS.

(d) Carry out an appropriate goodness-of-fit test to determine whether model M1
provides a good fit to the data. State the hypotheses, and give the test statistic
and the p-value of the test. What do you conclude at the α = 0.05 significance
level?

(e) Compare the odds of divorce for men with the odds of divorce for women using
an odds ratio, and interpret this odds ratio. Give a 95% confidence interval for
the odds ratio.

Figure 3: Summary output for the logit model log 
pijk
1−pijk 
= β0 + β
W
i + β
X
j + β
Y
k + β
XY
jk .