Description
Consider the data set from homework 2, problem 3 on the incidence of faults in the manufacturing
of rolls of fabric:
https://www.stat.columbia.edu/~gelman/book/data/fabric.asc
where the first column contains the length of each roll, which is the covariate with values xi
,
and the second column contains the number of faults, which is the response with values yi and
means µi
.
(a) Fit a Bayesian Poisson GLM to these data, using a logarithmic link, log(µi) = β1 + β2xi
.
Obtain the posterior distributions for β1 and β2 (under a flat prior for (β1, β2)), as well as point
and interval estimates for the response mean as a function of the covariate (over a grid of covariate values). Obtain the distributions of the posterior predictive residuals, and use them for
model checking.
(b) Develop a hierarchical extension of the Poisson GLM from part (a), using a gamma distribution for the response means across roll lengths. Specifically, for the second stage of the
hierarchical model, assume that
µi
| γi
, λ ind. ∼
1
Γ(λ)
λ
γi
λ
µ
λ−1
i
exp
−
λ
γi
µi
µi > 0; λ > 0, γi > 0,
where log(γi) = β1 + β2xi
. (Here, Γ(u) = R ∞
0
t
u−1
exp(−t)dt is the Gamma function.)
Derive the expressions for E(Yi
| β1, β2, λ) and Var(Yi
| β1, β2, λ), and compare them with the
corresponding expressions under the non-hierarchical model from part (a). Develop an MCMC
method for posterior simulation providing details for all its steps. Derive the expression for the
posterior predictive distribution of a new (unobserved) response y0 corresponding to a specified
covariate value x0, which is not included in the observed xi
. Implement the MCMC algorithm
to obtain the posterior distributions for β1, β2 and λ, as well as point and interval estimates for
the response mean as a function of the covariate (over a grid of covariate values). Discuss model
checking results based on posterior predictive residuals.
Regarding the priors, you can use again the flat prior for (β1, β2), but perform prior sensitivity analysis for λ considering different proper priors, including p(λ) = (λ + 1)−2
.
(c) Based on your results from parts (a) and (b), provide discussion on empirical comparison between the two models. Moreover, use the quadratic loss L measure for formal comparison
of the two models, in particular, to check if the hierarchical Poisson GLM offers an improvement to the fit of the non-hierarchical GLM. (Provide details on the required expressions for
computing the value of the model comparison criterion.)