STAT 578 – Advanced Bayesian Modeling Assignment 4 solution

$25.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (5 votes)

The groundbreaking 1929 paper1 by Edwin Hubble offered evidence for expansion of the universe.
Astronomical observations showed that “extra-galactic nebulae” (other galaxies) tended to be
moving away at a rate roughly proportional to their distance:
v ≈ H0D
where v is the radial velocity of the galaxy (away from us) in km/s, D is its proper distance in
megaparsecs (Mpc), and H0 is called the Hubble constant. The relationship is not exact – each
galaxy also has its own “peculiar velocity” that is unrelated to the expansion.
File hubbledata.txt contains Hubble’s original data on 24 astronomical objects, with their
assumed distance and radial velocity.
(a) [2 pts] Plot the data points: radial velocity versus distance.
(b) Consider a normal-theory simple linear regression model of radial velocity on distance of the
form
vi
| β, σ2
, Di ∼ indep. N

β1 + β2Di
, σ2

i = 1, . . . , 24
Of course, the theory predicts that the intercept β1 will be exactly zero, but your initial
model will not assume this. Also, according to theory, the slope β2 should be H0. Use
independent priors
β1, β2 ∼ iid N
0, 100002

σ
2 ∼ Inv-gamma(0.0001, 0.0001)
Do not standardize or center any variables.
(i) [2 pts] List an appropriate JAGS model.
Now run your model. Make sure to use multiple chains with overdispersed starting
points, check convergence, and monitor β1, β2, and σ
2
for at least 2000 iterations (per
chain) after burn-in.
(ii) [2 pts] List the coda summary of your results for β1, β2, and σ
2
.
(iii) [2 pts] Give the approximate posterior mean and 95% posterior credible interval for the
slope. (Does H0 appear to be positive?)
(iv) [2 pts] Give the approximate posterior mean and 95% posterior credible interval for the
intercept. (Does your interval contain zero?)
(c) Consider the model of the previous part, but without the intercept (i.e., assuming the
intercept is zero, as theory predicts). This is sometimes called regression through the origin.
Use the same priors as before for the remaining parameters.
1Edwin Hubble, A Relation between Distance and Radial Velocity among Extra-Galactic Nebulae, Proceedings of
the National Academy of Sciences, vol. 15, no. 3, pp. 168–173, March 1929
(i) [2 pts] List your modified JAGS model.
Now run your model. Make sure to use multiple chains with overdispersed starting
points, check convergence, and monitor parameters for at least 2000 iterations (per
chain) after burn-in.
(ii) [2 pts] List the coda summary of your results for all parameters.
(iii) [2 pts] Give the approximate posterior mean and 95% posterior credible interval for the
slope.
(iv) [2 pts] Compare the change in the posterior mean of the slope (versus part (b)) to its
posterior standard deviation. (Has it changed very much relative to the standard
deviation?) Also, is its credible interval wider or narrower than before?
(d) One way to check for evidence against the assumption that the intercept is zero is to produce
a posterior predictive p-value based on the no-intercept model. Consider test quantity
T(y, X, θ) = |cor( c ε, xD)|
where cor( c ε, xD) is sample correlation between the error vector ε (not standardized) and the
vector xD of distances D in the data. The larger this quantity is for the no-intercept model,
the less well that model fits the data (since, if a regression model actually fits, the errors
should ideally be uncorrelated with the predictor).
Use your JAGS simulations from the previous part. (Suggestion: Apply as.matrix to the
output of coda.samples to obtain a matrix of simulated parameter values.)
(i) [2 pts] Show R code for computing the simulated error vectors ε (as rows of a matrix).
(ii) [2 pts] Show R code for computing simulated replicate error vectors ε
rep (as rows of a
matrix), which are the error vectors for the replicate response vectors y
rep
.
(iii) [2 pts] Show R code for computing the simulated values of T(y, X, θ) and the simulated
values of T(y
rep, X, θ).
(iv) [2 pts] Plot the simulated values of T(y
rep, X, θ) versus those of T(y, X, θ), with a
reference line indicating where T(y
rep, X, θ) = T(y, X, θ).
(v) [2 pts] Compute the approximate posterior predictive p-value, and make an appropriate
conclusion based on it. (Does it provide evidence that the no-intercept model does not
fit?)
Remark: Modern determinations of H0 vary around 70 (km/s)/Mpc, which is probably much
different than what you obtained. Hubble’s distance data was systematically in error because he
had no accurate way to measure extra-galactic distances.
Total: 28 pts
2