Description
Question 1:
Continuing our investigations with the MACS data, the MACS-VL.RData dataset on the course
website has longitudinal information on CD4+ cell counts for K=225 MACS participants with
baseline viral load data. In this question we are going to consider the relationship between baseline
viral load and the rate of decline of CD4 count.
(a) Summarize the key variables using simple numerical and/or graphical summaries as relevant
to the scientific question of interest.
(b) Use appropriate exploratory methods to characterize the covariance structure of the data.
What structured covariance model(s) appear plausible/reasonable?
(c) Use the gls() command in the nlme library to fit the model:
E[Yki] = β0 + β1tki + β2xk + β3tkixk
where xk is the (possibly transformed) baseline viral load and tki is time since seroconversion
in months. Use compound symmetric correlation but consider both maximum likelihood and
restricted maximum likelihood for estimation. Present your results in a concise manner that
would be suitable for a journal and provide a precise interpretation of the estimates for the
mean model. Ccomment on whether there is a significant association between baseline viral
load and the rate of decline in CD4+ based on the estimates from this model.
(d) The model in part (c) restricts the analysis in that it is estimating a linear relationship
between (possibly-transformed) baseline viral load and CD4 count over time. As a way of
relaxing this restriction, consider categorizing baseline viral load. Given a categorization with
J levels, one alternative to the model in part (c) is
in which the slope for time depends on the value of the covariate. Specifically, while
the model in (c) assumes that γ1(xk) = β1 + β3xk, the model in (d) assumes a discrete
function where γ1(xk) = β1 +
P
j
β3kxk(j).
Hence, the model in (c) utilizes viral load
in its continuous form, but is restrictive in the nature of the relationship (i.e. linearity), the model in (d) utilizes a categorial version of viral load but makes no assumptions
regarding the functional form of how the rate of decline differs across the viral load categories.
Beyond these two special cases, allowing γ0(xk) and γ1(xk) to consider richer functional forms
than the linear form used in the model in (c) provides a more flexible description of how the
rate of decline differs for different values of baseline viral load. With this in mind, use a
varying coefficient model for the rate of decline in CD4+ that characterizes how the rate
of decline depends on baseline viral load. I recommend that you use natural or restricted
cubic splines for the coefficient functions and simply choose two knots. Plot the estimated
coefficient function ˆγ1(xk) with pointwise 95% confidence bands, and interpret specific values.
What does this plot suggest about the adequacy of the model in (c)?
2
fit the model and present your results in a concise manner that would be suitable for a journal.
Provide a precise interpretation of the estimates in this regression model, and comment on
whether there is a significant association between baseline viral load and the rate of decline
in CD4+ based on the estimates from this model.
(e) (optional) The models in parts (c) and (d) can be viewed as special cases of a ‘varying
coefficient’ model:
E[Yki] = γ0(xk) + γ1(xk)tki,
(aIm + b1m)
−1 =
1
a
Im −
b
a + mb1m
for a 6= 0 and a 6= −mb and:
|aIm + b1m| = a
m−1
(a + mb)
(a) Derive the likelihood and log-likelihood as a function of (µ, σ
2
, τ
2
).
(b) Show that the MLEs for µ, σ
2
, and τ
2 are given by:
µˆ = Y¯
··
σˆ
2 = MSE
τˆ
2 =
(1 − 1/n)MSA − MSE
m
where MSA = n
P
i
(Y¯
k· − Y¯
··)
2/(K − 1) and MSE = P
k
P
i
(Yki − Y¯
k·)
2/[K(n − 1)]. Hint: It
may be helpful to write λ = σ
2 + nτ 2
.
(c) Obtain the form for Var[ˆµ] and hence an estimate of this quantity.
(d) Find the REML estimators for σ
2 and τ
2 by integrating µ out of the likelihood in part (a).
(e) In the one-way random effects model with balanced data, it can be shown that:
MSA/(σ
2 + mτ 2
)
MSE/σ2
∼ FK−1, K(n−1)
where FK−1, K(n−1) denotes the F distribution with K − 1 and K(n − 1) degrees of freedom. Hence explain why F
? = MSA/MSE may be compared to an FK−1, K(n−1) to test the
hypothesis H: τ
2 = 0.
3
Question 2 (Optional):
Consider the one-way analysis of variance model:
Yki = µ + γk + ki,
with i = 1, . . . , n replicates on k = 1, . . . , K units and
γk ∼ Normal(0, τ
2
),
ki ∼ Normal(0, σ
2
),
γk ⊥ ki.
The following may be useful: Let Im denote the m × m identity matrix and 1m denote the m × 1
vector of 1’s. Then: