# STAT 578 – Advanced Bayesian Modeling Assignment 6 solution

\$25.00

Category:

## Description

File illinimensbb.csv, in comma-separated values (CSV) format, contains 2018–2019 season
statistics and roster information for fifteen Illini men’s basketball players. The first column
contains the jersey number, and the remaining columns contain player name, height (Ht, inches),
position (Pos, C=center, F=forward, G=guard), minutes of playing time (MIN), field goals made
(FGM), field goals attempted (FGA), and number of shots blocked (BLK).
You will build a logistic regression model for field goals, and a Poisson loglinear regression model
for shots blocked, using JAGS and rjags.
1. [2 pts] Using plot(Ht ~ Pos, data= · · · ), display box plots of height by position. Is there
a relationship between height and position? (Such a relationship might cause substantial
posterior correlations between regression coefficients if both height and position are used as
explanatory variables.)
2. Let yi be the number of field goals made by player i out of ni attempts (i = 1, . . . , 15).
Consider the following logistic regression (with implicit intercept) on player position and
height:
yi
| pi ∼ indep. Bin(ni
, pi)
logit(pi) = βPos(i) + βHt Hi
where
Pos(i) = player i position (C, F, G)
Hi = player i height after centering and scaling to sample standard dev. 0.5
Consider the prior
βC, βF, βG ∼ iid t1

0, 102

βHt ∼ t1

0, 2.5
2

(a) [2 pts] List an appropriate JAGS model. Include nodes for the vector of binomial
probabilities pi and a vector y
rep of replicate responses.
Now run your model using rjags. Make sure to use multiple chains with overdispersed
starting points, check convergence, and monitor the regression coefficients,
probabilities, and replicate responses (after convergence) long enough to obtain
effective sample sizes of at least 4000 for each regression coefficient.
(b) [2 pts] Display the coda summary of the results for the monitored regression
coefficients.
(c) [2 pts] With your posterior samples, display scatterplots of (i) βC versus βHt, (ii) βF
versus βHt, and (iii) βG versus βHt. Do you see (posterior) correlations?
(d) [2 pts] Consider the modeled probability that Ayo Dosunmu (No. 11) successfully
makes an attempted field goal. Plot the (approximate) posterior density of this
probability.
(e) [2 pts] Approximate the posterior probability that βF > βG (i.e., that forwards have a
higher probability of successfully making an attempted field goal than guards, after
adjusting for height). Also, approximate the Bayes factor favoring βF > βG versus
βF < βG. (Note that, by symmetry, βF > βG and βF < βG have equal prior
probability.) What can you say about the data evidence that βF > βG?
(f) [2 pts] Use the chi-square discrepancy to compute an approximate posterior predictive
p-value. Does it indicate any evidence of problems (such as overdispersion)?
(g) Now consider expanding the model to allow for overdispersion, as follows:
logit(pi) = βPos(i) + βHt Hi + εi
with
εi
| σε ∼ iid N
0, σ2
ε

σε ∼ U(0, 10)
and everything else the same as before.
(i) [3 pts] List an appropriately modified JAGS model.
Then run it using rjags, with all of the usual steps.
(ii) [1 pt] Plot the (approximate) posterior density of σε.
(iii) [2 pts] Repeat part (e) (not part (f)) under this expanded model. Does your
conclusion change?
3. Let yi be the number of shots blocked by player i (i = 1, . . . , 15). Consider the following
Poisson loglinear regression (with implicit intercept) on player position and height, using
minutes of playing time as a rate (exposure) variable:
yi
| ri
, ti ∼ indep. Poisson(tiri)
log(ri) = βPos(i) + βHt H∗
i
where
ti = player i total minutes of playing time
Pos(i) = player i position (C, F, G)
H∗
i = player i height after standardizing
(centering and scaling to sample standard dev. 1)
(Note that the scaling of H∗
i
is different than that of Hi
in the previous part.)
Consider the prior
βC, βF, βG, βHt ∼ iid N
0, 1002

(a) [2 pts] List an appropriate JAGS model. Include nodes for the vector of Poisson
means λi = tiri and a vector y
rep of replicate responses.
Now run your model using rjags. Make sure to use multiple chains with overdispersed
starting points, check convergence, and monitor the regression coefficients, Poisson
means, and replicate responses (after convergence) long enough to obtain effective
sample sizes of at least 4000 for each regression coefficient.
(b) [2 pts] Display the coda summary of the results for the monitored regression
coefficients.
(c) [2 pts] The sampling model implies that
e
βHt
represents the factor by which the mean rate of blocking shots changes for each
increase in height of one standard deviation (here, about 3.5 inches). (Under the
model, this factor is the same for all positions.) Form an approximate 95% central
posterior credible interval for this factor. According to your interval, does it seem that
greater height is associated with a higher rate of blocking shots?
(d) [2 pts] Use the chi-square discrepancy to compute an approximate posterior predictive
p-value. Does it indicate any evidence of problems?
(e) For each player (i), approximate Pr
y
rep
i ≥ yi
| y

, which is a kind of marginal posterior
predictive p-value.
(i) [2 pts] Show your R code, and display a table with the player names and their
values of this probability.
(ii) [1 pt] Name any players for whom this probability is less than 0.05. (Any such
player blocked notably more shots than the model would suggest, for his position
and height.)
(iii) [1 pt] Notice that the probability equals 1 for some players. Why is that actually
not surprising? (Hint: How many shots were actually blocked by those players?
How much playing time did they have?)
Total: 32 pts