STAT 578 Assignments 1 to 6 solutions

$120.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

Rate this product

STAT 578 – Advanced Bayesian Modeling Assignment 1

1. The film review aggregator website rottentomatoes.com publishes ranked lists of movies
based on the number of positive critical reviews out of a total number counted for each
movie. See, for example, https://www.rottentomatoes.com/top/bestofrt/?year=2019.
Because the site uses an “Adjusted Score,” a movie with a higher approval percentage
sometimes ranks lower on the list.
Consider the following hypothetical scenario:
Movie 1: 425 positive reviews out of 500 (85%)
Movie 2: 9 positive reviews out of 10 (90%)
Assume that reviews of Movie i are independent with a common probability pi of being
positive (depending on the movie). Assume a U(0, 1) prior on each pi
.
(a) [4 pts] Determine the posterior distribution of p1 and of p2 (separately). (Name the
type of distribution and give the values of its defining constants.)
(b) [3 pts] Which movie ranks higher according to posterior mean? According to posterior
median? According to posterior mode? Show your computations. (For median, use R
function qbeta. For mean and mode, use formulas in BDA3, Table A.1. Do not use
simulation, as it may not be sufficiently accurate.)
2. File randomwikipedia.txt contains the ID number and number of bytes in length for 20
randomly selected English Wikipedia articles.
(a) (i) [2 pts] Draw a histogram of article length, and describe the distribution.
(ii) [2 pts] Transform article length to the (natural) log scale. Then re-draw the
histogram and describe the distribution.
(iii) [1 pt] Based on your histograms, explain why the log scale would be better to use
for the remainder of the analysis. (Read below.)
(b) [2 pts] Let yi be length of article i on the log scale (i.e., the natural logarithm of the
number of bytes). Compute the sample mean and sample variance of y1, . . . , y20.
In the remaining parts, assume the yis have a normal sampling distribution with
mean µ and variance σ
2
.
(c) Assume σ
2
is known to equal the sample variance. Consider a flat prior for µ. Use it to:
(i) [3 pts] Compute the posterior mean, posterior variance, and posterior precision
of µ.
(ii) [2 pts] Plot the prior density and the posterior density of µ together in a single
plot. Label which is which.
(iii) [2 pts] Compute a 95% central posterior interval for µ.
(d) Now let µ and σ
2 have prior
p(µ, σ2
) ∝

σ
2
−1
σ
2 > 0

Use it to:
(i) [3 pts] Compute the posterior mean, posterior variance, and posterior precision
of µ. (If you cannot compute explicitly, use a good computational approximation.)
(ii) [2 pts] Approximate a 95% central posterior interval for µ.
(iii) [2 pts] Approximate a 95% central posterior interval for σ
2
.
(e) Assume the prior of the previous part. Use simulation in R to answer the following,
based on 1,000,000 draws from the posterior.
(i) [2 pts] Approximate a 95% central posterior predictive interval for the length (in
bytes) of a single (new) randomly selected article. (Note that this is on the
original scale, not the log scale.)
(ii) [2 pts] Approximate the posterior predictive probability that the length of a single
(new) randomly selected article will exceed the maximum article length in the
data.
(iii) [2 pts] Approximate the posterior predictive probability that the maximum length
of 20 (new) randomly selected articles will exceed the maximum article length in
the data. (Be careful! All 20 randomly selected articles have the same value for µ
and for σ
2
.)
Reminder: Show the R code you used and also a summary of the approximate inference results
that you used to answer the preceding parts.
Total: 34 pts

STAT 578 – Advanced Bayesian Modeling Assignment 2

1. Consider the two different hyperprior formulations for the binomial hierarchical model of
Lesson 3.2: Hierarchical Modeling Fundamentals. This exercise shows how different those
priors are.
Note: Consult help(distributions) in R for the random number generators you will need.
(You do not need JAGS.)
(a) The first prior formulation was
θj | α, β ∼ Beta(α, β)
α, β ∼ iid Expon(0.001)
(i) [2 pts] Independently simulate 1000 pairs (α, β) from their hyperprior, and
produce a scatterplot of log(β) versus log(α).
(ii) [2 pts] Using the simulated pairs (α, β), forward-simulate θj , and produce a
histogram of the result (an approximation of its marginal prior).
(b) The second prior formulation was
θj | α, β ∼ Beta(α, β)
α = φ1/φ2
2 β = (1 − φ1)/φ2
2
φ1 ∼ U(0, 1) φ2 ∼ U(0, 1000)
(i) [2 pts] Independently simulate 1000 pairs (α, β) from their hyperprior, and
produce a scatterplot of log(β) versus log(α).
(ii) [2 pts] Using the simulated pairs (α, β), forward-simulate θj , and produce a
histogram of the result (an approximation of its marginal prior).
2. Twelve separate case-control studies were run to investigate the potential link between
presence of a certain genetic trait (the PlA2 polymorphism of the glycoprotein IIIa subunit
of the fibrinogen receptor) and risk of heart attack.1 For the j
th study, an estimated
log-odds ratio, ψˆ
j , and its (estimated) standard error, σj , were computed:
j ψˆ
j σj j ψˆ
j σj j ψˆ
j σj
1 1.055 0.373 5 1.068 0.471 9 0.507 0.186
2 -0.097 0.116 6 -0.025 0.120 10 0.000 0.328
3 0.626 0.229 7 -0.117 0.220 11 0.385 0.206
4 0.017 0.117 8 -0.381 0.239 12 0.405 0.254
1From Burr, et al. (2003), Statistics in Medicine, 22: 1741–1760.
1
Consider this Bayesian hierarchical model:
ψˆ
j | ψj ∼ indep. N

ψj , σ2
j

j = 1, …, 12
ψj | ψ0, σ0 ∼ iid N
ψ0, σ2
0

j = 1, …, 12
ψ0 ∼ N

0, 10002

σ0 ∼ U(0, 1000)
with ψ0 and σ0 independent, and the values σj , j = 1, . . . , 12, regarded as fixed and known.
(a) [2 pts] Specify improper densities that the proper hyperpriors given above are
apparently intended to approximate. (Which parameters are the hyperparameters?)
(b) [5 pts] Draw a directed acyclic graph (DAG) appropriate for this model. (Use the
notation introduced in lecture, including “plates.”) You may draw it neatly by hand or
use software.
(c) [5 pts] Using the template asgn2template.bug provided on the course website, form a
JAGS model statement (consistent with your graph). Also, set up any R (rjags)
statements appropriate for creating a JAGS model. Be careful to name your data
variables correctly. [Also remember: JAGS “dnorm” uses precisions, not variances!]
(d) [5 pts] Run at least 10,000 iterations of burn-in, then 100,000 iterations to use for
inference. For both ψ0 and σ
2
0
(not σ0), produce a posterior numerical summary and
also graphical estimates of the posterior densities. Explicitly give the approximations of
the posterior expected values, posterior standard deviations, and 95% central posterior
intervals. (Just showing R output is not enough!)
(e) Suppose a new case-control study is to be performed, and assume that its log-odds
standard error (new σ) will be 0.125. Assume the ψ for the new study is exchangeable
with those for the previous studies.
(i) [2 pts] Re-draw your DAG, adding new nodes to represent the new ψˆ and new ψ.
(ii) [2 pts] Correspondingly modify your JAGS model to answer the following parts.
Show the modified JAGS and R code and output that you used.
(iii) [3 pts] Estimate the posterior mean and posterior standard deviation, and form a
95% central posterior predictive interval for the estimated log-odds ratio that the
new study will obtain. (Remember, this new estimated log-odds ratio will be the
new ψˆ, not the new ψ.)
(iv) [1 pt] Estimate the posterior predictive probability that the new estimated
log-odds ratio will be at least twice its standard error, i.e., at least two standard
errors (2σ) greater than zero. (This is roughly the posterior probability that the
new study will find a statistically significant result, and in the positive direction.)
Suggestion: Add an indicator variable to your JAGS model – one that equals 1
when the event occurs, and 0 otherwise. (What is its mean?)
Use at least 10,000 iterations of burn-in, and 100,000 for inference as before.
Total: 33 pts

STAT 578 – Advanced Bayesian Modeling Assignment 3

1. (a) [2 pts] R script FlintGibbs.R implements a Gibbs sampler for the partially conjugate
Flint data model. It is very similar to the example R code in Lesson 7.2: Gibbs
Sampling. Use the script to simulate from the posterior for µ and σ
2
. Then use R
function acf to produce an autocorrelation plot for the successive µ variates and an
autocorrelation plot for the successive σ
2 variates.
(b) R script FlintMetropolis.R implements a Metropolis sampler for the same Flint data
model. It is very similar to the example R code in Lesson 7.3: Metropolis and
Metropolis-Hastings.
(i) [1 pt] Experiment with different settings for the proposal variance rho (by
uncommenting its line and changing the value). Find a value of rho that gives an
overall (average) acceptance rate of about 0.35.1 What value of rho did you find?
[Warning: Using a value of rho that is too large may cause the R code to produce
an error, due to numerical problems. Start with smaller values of rho.]
(ii) [2 pts] With the value of rho that you found, use the script to simulate from the
posterior for µ and σ
2
. Then use R function acf to produce an autocorrelation
plot for the successive µ variates and an autocorrelation plot for the successive σ
2
variates.
(c) [1 pt] Compare the autocorrelation plots from the previous two parts. Which method
exhibited faster mixing: the Gibbs sampler or the Metropolis sampler?
2. Use the 2016 US presidential polls data in polls2016.txt to answer the following, running
all parts using JAGS and R (rjags). Remember that you will have to create a variable
sigma in R to represent the standard deviation of the polls, defined to be half of the margin
of error. Refer to Lesson 4.2: Normal Hierarchical Model in R/JAGS.
(a) Use the model in polls20161.bug for the following:
(i) [2 pts] Create an initialization list (in R) supporting 4 chains, with a different
initialization for each chain. Set initial values for mu to ±100 and values for tau to
100 or 0.01. Then use jags.model to create the JAGS model R object with these
initializations. List all of the R code you used.
(ii) [2 pts] Perform a burn-in of 2500 iterations, then monitor the mu and tau nodes
for 5000 iterations (for each chain). List all of the R code you used.
(iii) [3 pts] For the iterations you monitored, produce trace plots of mu and tau. Do
there appear to be any convergence problems? Display the plots and R code that
produced them.
(iv) [2 pts] For the iterations you monitored, compute Gelman-Rubin statistics
(potential scale reduction factors) for mu and tau. Do there appear to be any
convergence problems? Show your R code and its output.
1This is approximately the “optimal” acceptance rate for a two-dimensional parameter. See Roberts, G. O., &
Rosenthal, J. S. (2001). Optimal scaling for various Metropolis-Hastings algorithms. Statistical Science, 16, 351–367.
1
(v) [2 pts] For the iterations you monitored, display autocorrelation plots for mu and
tau for one of the chains. (Hint: For example, to reference the first chain of an
mcmc.list object named x, use x[[1]].) Comment on the apparent speed of
mixing.
(vi) [2 pts] For the iterations you monitored, compute effective sample sizes (over all
chains) for mu and tau. Would they be considered adequate? Show your R code
and its output.
(b) Now consider a new model that uses an almost flat prior for tau on the log scale, as
follows: Create a new JAGS model by modifying polls20161.bug to eliminate the
current prior for tau, create a new parameter logtau with a U(−100, 100) prior
distribution, and define tau to be equal to exp(logtau).
(i) [2 pts] Display all of the code for your new JAGS model.
(ii) [2 pts] Create an initialization list (in R) supporting 4 chains, with a different
initialization for each chain. Set initial values for mu to ±100 and values for
logtau to log 100 or log 0.01. Then use jags.model to create the JAGS model R
object with these initializations. List all of the R code you used.
(iii) [2 pts] Perform a burn-in of 2500 iterations, then monitor the mu and tau nodes
for 5000 iterations (for each chain). List all of the R code you used.
(iv) [3 pts] For the iterations you monitored, produce trace plots of mu and tau. Do
there appear to be any convergence problems? Display the plots and R code that
produced them.
(v) [2 pts] For the iterations you monitored, compute Gelman-Rubin statistics
(potential scale reduction factors) for mu and tau. Do there appear to be any
convergence problems? Show your R code and its output.
(vi) [2 pts] For the iterations you monitored, display autocorrelation plots for mu and
tau for one of the chains.2 Comment on the apparent speed of mixing.
(vii) [2 pts] What is wrong with this model that could explain any problems you
noted? (Hint: What would happen if you used an improper flat prior on log τ?)
Total: 34 pts
2Ordinarily, autocorrelation plots are not used for chains that have not converged, but you are asked to produce
them here even if there was no convergence.
2

STAT 578 – Advanced Bayesian Modeling Assignment 4

The groundbreaking 1929 paper1 by Edwin Hubble offered evidence for expansion of the universe.
Astronomical observations showed that “extra-galactic nebulae” (other galaxies) tended to be
moving away at a rate roughly proportional to their distance:
v ≈ H0D
where v is the radial velocity of the galaxy (away from us) in km/s, D is its proper distance in
megaparsecs (Mpc), and H0 is called the Hubble constant. The relationship is not exact – each
galaxy also has its own “peculiar velocity” that is unrelated to the expansion.
File hubbledata.txt contains Hubble’s original data on 24 astronomical objects, with their
assumed distance and radial velocity.
(a) [2 pts] Plot the data points: radial velocity versus distance.
(b) Consider a normal-theory simple linear regression model of radial velocity on distance of the
form
vi
| β, σ2
, Di ∼ indep. N

β1 + β2Di
, σ2

i = 1, . . . , 24
Of course, the theory predicts that the intercept β1 will be exactly zero, but your initial
model will not assume this. Also, according to theory, the slope β2 should be H0. Use
independent priors
β1, β2 ∼ iid N
0, 100002

σ
2 ∼ Inv-gamma(0.0001, 0.0001)
Do not standardize or center any variables.
(i) [2 pts] List an appropriate JAGS model.
Now run your model. Make sure to use multiple chains with overdispersed starting
points, check convergence, and monitor β1, β2, and σ
2
for at least 2000 iterations (per
chain) after burn-in.
(ii) [2 pts] List the coda summary of your results for β1, β2, and σ
2
.
(iii) [2 pts] Give the approximate posterior mean and 95% posterior credible interval for the
slope. (Does H0 appear to be positive?)
(iv) [2 pts] Give the approximate posterior mean and 95% posterior credible interval for the
intercept. (Does your interval contain zero?)
(c) Consider the model of the previous part, but without the intercept (i.e., assuming the
intercept is zero, as theory predicts). This is sometimes called regression through the origin.
Use the same priors as before for the remaining parameters.
1Edwin Hubble, A Relation between Distance and Radial Velocity among Extra-Galactic Nebulae, Proceedings of
the National Academy of Sciences, vol. 15, no. 3, pp. 168–173, March 1929
(i) [2 pts] List your modified JAGS model.
Now run your model. Make sure to use multiple chains with overdispersed starting
points, check convergence, and monitor parameters for at least 2000 iterations (per
chain) after burn-in.
(ii) [2 pts] List the coda summary of your results for all parameters.
(iii) [2 pts] Give the approximate posterior mean and 95% posterior credible interval for the
slope.
(iv) [2 pts] Compare the change in the posterior mean of the slope (versus part (b)) to its
posterior standard deviation. (Has it changed very much relative to the standard
deviation?) Also, is its credible interval wider or narrower than before?
(d) One way to check for evidence against the assumption that the intercept is zero is to produce
a posterior predictive p-value based on the no-intercept model. Consider test quantity
T(y, X, θ) = |cor( c ε, xD)|
where cor( c ε, xD) is sample correlation between the error vector ε (not standardized) and the
vector xD of distances D in the data. The larger this quantity is for the no-intercept model,
the less well that model fits the data (since, if a regression model actually fits, the errors
should ideally be uncorrelated with the predictor).
Use your JAGS simulations from the previous part. (Suggestion: Apply as.matrix to the
output of coda.samples to obtain a matrix of simulated parameter values.)
(i) [2 pts] Show R code for computing the simulated error vectors ε (as rows of a matrix).
(ii) [2 pts] Show R code for computing simulated replicate error vectors ε
rep (as rows of a
matrix), which are the error vectors for the replicate response vectors y
rep
.
(iii) [2 pts] Show R code for computing the simulated values of T(y, X, θ) and the simulated
values of T(y
rep, X, θ).
(iv) [2 pts] Plot the simulated values of T(y
rep, X, θ) versus those of T(y, X, θ), with a
reference line indicating where T(y
rep, X, θ) = T(y, X, θ).
(v) [2 pts] Compute the approximate posterior predictive p-value, and make an appropriate
conclusion based on it. (Does it provide evidence that the no-intercept model does not
fit?)
Remark: Modern determinations of H0 vary around 70 (km/s)/Mpc, which is probably much
different than what you obtained. Hubble’s distance data was systematically in error because he
had no accurate way to measure extra-galactic distances.
Total: 28 pts
2

STAT 578 – Advanced Bayesian Modeling Assignment 5

File usparkvisits.csv contains annual numbers of recreational visits to 30 different parks
managed by the US National Park Service for the years 2006 through 2018.1 You will build and
compare two different varying-coefficient hierarchical normal regression models for the log-scale
numbers, using JAGS and rjags.
(a) (i) [2 pts] On the same set of axes, plot segmented lines, one for each park, representing
the number of visits versus the year (2006 through 2018). Distinguish the lines for
different parks by using different colors or line types. (You should label the axes, but no
legend is needed.)
(ii) [2 pts] Repeat the previous part using the natural logarithm of the number of visits.
Let yij be the natural logarithm of the number of visits to park j in year i (i = 1, . . . , 13,
j = 1, . . . , 30). For each park, model the log-number as a simple linear regression on the
centered year number:
yij | β
(j)
, σ2
y
, X ∼ indep. N

β
(j)
1 + β
(j)
2
(xi − x¯), σ2
y

where
β
(j) =

β
(j)
1
β
(j)
2
!
j = 1, . . . , 30 xi = i i = 1, . . . , 13
Note that the coefficients are allowed to depend on the park, but the variance is not.
(b) Let βˆ
(j)
1
and βˆ
(j)
2
be the ordinary least squares estimates of β
(j)
1
and β
(j)
2
, estimated for
park j. You may use the lm function in R to compute these estimates. (For this part, the
coefficient pairs are estimated completely separately for each park.)
(i) [1 pt] Produce a scatterplot of the pairs
βˆ
(j)
1
, βˆ
(j)
2

, j = 1, . . . , 30.
(ii) [1 pt] Give the average (sample mean) of βˆ
(j)
1
and also of βˆ
(j)
2
.
(iii) [1 pt] Give the sample variance of βˆ
(j)
1
and also of βˆ
(j)
2
.
(iv) [1 pt] Give the sample correlation between βˆ
(j)
1
and βˆ
(j)
2
.
(c) Consider the bivariate prior
β
(j)
| µβ, Σβ ∼ iid N(µβ, Σβ)
µβ =

µβ1
µβ2
!
Σβ =

σ
2
β1
ρ σβ1 σβ2
ρ σβ1 σβ2 σ
2
β2
!
with hyperpriors
µβ ∼ N

0, 10002
I

Σ
−1
β ∼ Wishart2

Σ
−1
0
/2

1Data from https://irma.nps.gov/STAT
in the notation used in the lecture videos. For your analysis, use
Σ0 =

10 0
0 0.01!
based on preliminary analyses. Let the prior on σ
2
y be
σ
2
y ∼ Inv-gamma(0.0001, 0.0001)
(i) [2 pts] List an appropriate JAGS model. Make sure to create nodes for Σβ, ρ, and σ
2
y
.
Remember that the numbers of visits are to be analyzed on the log scale.
Now run your model using rjags. Make sure to use multiple chains with overdispersed
starting points, check convergence, and monitor µβ, Σβ, σ
2
y
, and ρ (after convergence)
long enough to obtain effective sample sizes of at least 4000 for each parameter.
(ii) [2 pts] Display the coda summary of the results for the monitored parameters.
(iii) [2 pts] Give an approximate 95% central posterior interval for the correlation
parameter ρ, and also produce a graph of its (estimated) posterior density.
(iv) [2 pts] Approximate the posterior probability that ρ < 0. Also, compute the Bayes
factor favoring ρ < 0 versus ρ > 0. (You may use the fact that ρ < 0 and ρ > 0 have
equal prior probability.) Describe the level of data evidence for ρ < 0.
(v) [1 pt] Your model implies that, over the 12 years from 2006 to 2018, the (population)
median number of visits should have changed by a factor of
e
12µβ2
Form an approximate 95% central posterior interval for this quantity.
(vi) [2 pts] Use the rjags function dic.samples to compute the effective number of
parameters (“penalty”) and Plummer’s DIC (“Penalized deviance”). Use at least
100,000 iterations.
(d) Now consider a different model with “univariate” hyperpriors for the model coefficients, which
do not allow for a coefficient correlation parameter:
β
(j)
1
| µβ1
, σβ1 ∼ iid N
µβ1
, σ2
β1

β
(j)
2
| µβ2
, σβ2 ∼ iid N
µβ2
, σ2
β2

with hyperpriors
µβ1
, µβ2 ∼ iid N
0, 10002

σβ1
, σβ2 ∼ iid U(0, 1000)
Let the prior on σ
2
y be the same as in the previous model.
(i) [4 pts] Draw a complete DAG for this new model.
(ii) [2 pts] List an appropriate JAGS model. Make sure that there are nodes for σ
2
β1
, σ
2
β2
,
and σ
2
y
.
Remember that the numbers of visits are to be analyzed on the log scale.
Now run your model using rjags. Make sure to use multiple chains with overdispersed
starting points, check convergence, and monitor µβ1
, µβ2
, σ
2
β1
, σ
2
β2
, σ
2
y
(after
convergence) long enough to obtain effective sample sizes of at least 4000 for each
parameter.
(iii) [2 pts] Display the coda summary of the results for the monitored parameters.
(iv) [2 pts] Recall the (population) median change factor for the number of visits,
e
12µβ2
as considered in the previous analysis. Form an approximate 95% central posterior
interval for this quantity, and compare it with the previous results.
(v) [2 pts] Use the rjags function dic.samples to compute the effective number of
parameters (“penalty”) and Plummer’s DIC (“Penalized deviance”). Use at least
100,000 iterations.
(vi) [1 pt] Compare the (Plummer’s) DIC values for this model and the previous one.
Which is preferred?
Total: 32 pts
3

STAT 578 – Advanced Bayesian Modeling Assignment 6

File illinimensbb.csv, in comma-separated values (CSV) format, contains 2018–2019 season
statistics and roster information for fifteen Illini men’s basketball players. The first column
contains the jersey number, and the remaining columns contain player name, height (Ht, inches),
position (Pos, C=center, F=forward, G=guard), minutes of playing time (MIN), field goals made
(FGM), field goals attempted (FGA), and number of shots blocked (BLK).
You will build a logistic regression model for field goals, and a Poisson loglinear regression model
for shots blocked, using JAGS and rjags.
1. [2 pts] Using plot(Ht ~ Pos, data= · · · ), display box plots of height by position. Is there
a relationship between height and position? (Such a relationship might cause substantial
posterior correlations between regression coefficients if both height and position are used as
explanatory variables.)
2. Let yi be the number of field goals made by player i out of ni attempts (i = 1, . . . , 15).
Consider the following logistic regression (with implicit intercept) on player position and
height:
yi
| pi ∼ indep. Bin(ni
, pi)
logit(pi) = βPos(i) + βHt Hi
where
Pos(i) = player i position (C, F, G)
Hi = player i height after centering and scaling to sample standard dev. 0.5
Consider the prior
βC, βF, βG ∼ iid t1

0, 102

βHt ∼ t1

0, 2.5
2

(a) [2 pts] List an appropriate JAGS model. Include nodes for the vector of binomial
probabilities pi and a vector y
rep of replicate responses.
Now run your model using rjags. Make sure to use multiple chains with overdispersed
starting points, check convergence, and monitor the regression coefficients,
probabilities, and replicate responses (after convergence) long enough to obtain
effective sample sizes of at least 4000 for each regression coefficient.
(b) [2 pts] Display the coda summary of the results for the monitored regression
coefficients.
(c) [2 pts] With your posterior samples, display scatterplots of (i) βC versus βHt, (ii) βF
versus βHt, and (iii) βG versus βHt. Do you see (posterior) correlations?
(d) [2 pts] Consider the modeled probability that Ayo Dosunmu (No. 11) successfully
makes an attempted field goal. Plot the (approximate) posterior density of this
probability.
(e) [2 pts] Approximate the posterior probability that βF > βG (i.e., that forwards have a
higher probability of successfully making an attempted field goal than guards, after
adjusting for height). Also, approximate the Bayes factor favoring βF > βG versus
βF < βG. (Note that, by symmetry, βF > βG and βF < βG have equal prior
probability.) What can you say about the data evidence that βF > βG?
(f) [2 pts] Use the chi-square discrepancy to compute an approximate posterior predictive
p-value. Does it indicate any evidence of problems (such as overdispersion)?
(g) Now consider expanding the model to allow for overdispersion, as follows:
logit(pi) = βPos(i) + βHt Hi + εi
with
εi
| σε ∼ iid N
0, σ2
ε

σε ∼ U(0, 10)
and everything else the same as before.
(i) [3 pts] List an appropriately modified JAGS model.
Then run it using rjags, with all of the usual steps.
(ii) [1 pt] Plot the (approximate) posterior density of σε.
(iii) [2 pts] Repeat part (e) (not part (f)) under this expanded model. Does your
conclusion change?
3. Let yi be the number of shots blocked by player i (i = 1, . . . , 15). Consider the following
Poisson loglinear regression (with implicit intercept) on player position and height, using
minutes of playing time as a rate (exposure) variable:
yi
| ri
, ti ∼ indep. Poisson(tiri)
log(ri) = βPos(i) + βHt H∗
i
where
ti = player i total minutes of playing time
Pos(i) = player i position (C, F, G)
H∗
i = player i height after standardizing
(centering and scaling to sample standard dev. 1)
(Note that the scaling of H∗
i
is different than that of Hi
in the previous part.)
Consider the prior
βC, βF, βG, βHt ∼ iid N
0, 1002

(a) [2 pts] List an appropriate JAGS model. Include nodes for the vector of Poisson
means λi = tiri and a vector y
rep of replicate responses.
Now run your model using rjags. Make sure to use multiple chains with overdispersed
starting points, check convergence, and monitor the regression coefficients, Poisson
means, and replicate responses (after convergence) long enough to obtain effective
sample sizes of at least 4000 for each regression coefficient.
(b) [2 pts] Display the coda summary of the results for the monitored regression
coefficients.
(c) [2 pts] The sampling model implies that
e
βHt
represents the factor by which the mean rate of blocking shots changes for each
increase in height of one standard deviation (here, about 3.5 inches). (Under the
model, this factor is the same for all positions.) Form an approximate 95% central
posterior credible interval for this factor. According to your interval, does it seem that
greater height is associated with a higher rate of blocking shots?
(d) [2 pts] Use the chi-square discrepancy to compute an approximate posterior predictive
p-value. Does it indicate any evidence of problems?
(e) For each player (i), approximate Pr
y
rep
i ≥ yi
| y

, which is a kind of marginal posterior
predictive p-value.
(i) [2 pts] Show your R code, and display a table with the player names and their
values of this probability.
(ii) [1 pt] Name any players for whom this probability is less than 0.05. (Any such
player blocked notably more shots than the model would suggest, for his position
and height.)
(iii) [1 pt] Notice that the probability equals 1 for some players. Why is that actually
not surprising? (Hint: How many shots were actually blocked by those players?
How much playing time did they have?)
Total: 32 pts