## Description

File illinimensbb.csv, in comma-separated values (CSV) format, contains 2018–2019 season

statistics and roster information for fifteen Illini men’s basketball players. The first column

contains the jersey number, and the remaining columns contain player name, height (Ht, inches),

position (Pos, C=center, F=forward, G=guard), minutes of playing time (MIN), field goals made

(FGM), field goals attempted (FGA), and number of shots blocked (BLK).

You will build a logistic regression model for field goals, and a Poisson loglinear regression model

for shots blocked, using JAGS and rjags.

1. [2 pts] Using plot(Ht ~ Pos, data= · · · ), display box plots of height by position. Is there

a relationship between height and position? (Such a relationship might cause substantial

posterior correlations between regression coefficients if both height and position are used as

explanatory variables.)

2. Let yi be the number of field goals made by player i out of ni attempts (i = 1, . . . , 15).

Consider the following logistic regression (with implicit intercept) on player position and

height:

yi

| pi ∼ indep. Bin(ni

, pi)

logit(pi) = βPos(i) + βHt Hi

where

Pos(i) = player i position (C, F, G)

Hi = player i height after centering and scaling to sample standard dev. 0.5

Consider the prior

βC, βF, βG ∼ iid t1

0, 102

βHt ∼ t1

0, 2.5

2

(a) [2 pts] List an appropriate JAGS model. Include nodes for the vector of binomial

probabilities pi and a vector y

rep of replicate responses.

Now run your model using rjags. Make sure to use multiple chains with overdispersed

starting points, check convergence, and monitor the regression coefficients,

probabilities, and replicate responses (after convergence) long enough to obtain

effective sample sizes of at least 4000 for each regression coefficient.

(b) [2 pts] Display the coda summary of the results for the monitored regression

coefficients.

(c) [2 pts] With your posterior samples, display scatterplots of (i) βC versus βHt, (ii) βF

versus βHt, and (iii) βG versus βHt. Do you see (posterior) correlations?

(d) [2 pts] Consider the modeled probability that Ayo Dosunmu (No. 11) successfully

makes an attempted field goal. Plot the (approximate) posterior density of this

probability.

(e) [2 pts] Approximate the posterior probability that βF > βG (i.e., that forwards have a

higher probability of successfully making an attempted field goal than guards, after

adjusting for height). Also, approximate the Bayes factor favoring βF > βG versus

βF < βG. (Note that, by symmetry, βF > βG and βF < βG have equal prior

probability.) What can you say about the data evidence that βF > βG?

(f) [2 pts] Use the chi-square discrepancy to compute an approximate posterior predictive

p-value. Does it indicate any evidence of problems (such as overdispersion)?

(g) Now consider expanding the model to allow for overdispersion, as follows:

logit(pi) = βPos(i) + βHt Hi + εi

with

εi

| σε ∼ iid N

0, σ2

ε

σε ∼ U(0, 10)

and everything else the same as before.

(i) [3 pts] List an appropriately modified JAGS model.

Then run it using rjags, with all of the usual steps.

(ii) [1 pt] Plot the (approximate) posterior density of σε.

(iii) [2 pts] Repeat part (e) (not part (f)) under this expanded model. Does your

conclusion change?

3. Let yi be the number of shots blocked by player i (i = 1, . . . , 15). Consider the following

Poisson loglinear regression (with implicit intercept) on player position and height, using

minutes of playing time as a rate (exposure) variable:

yi

| ri

, ti ∼ indep. Poisson(tiri)

log(ri) = βPos(i) + βHt H∗

i

where

ti = player i total minutes of playing time

Pos(i) = player i position (C, F, G)

H∗

i = player i height after standardizing

(centering and scaling to sample standard dev. 1)

(Note that the scaling of H∗

i

is different than that of Hi

in the previous part.)

Consider the prior

βC, βF, βG, βHt ∼ iid N

0, 1002

(a) [2 pts] List an appropriate JAGS model. Include nodes for the vector of Poisson

means λi = tiri and a vector y

rep of replicate responses.

Now run your model using rjags. Make sure to use multiple chains with overdispersed

starting points, check convergence, and monitor the regression coefficients, Poisson

means, and replicate responses (after convergence) long enough to obtain effective

sample sizes of at least 4000 for each regression coefficient.

(b) [2 pts] Display the coda summary of the results for the monitored regression

coefficients.

(c) [2 pts] The sampling model implies that

e

βHt

represents the factor by which the mean rate of blocking shots changes for each

increase in height of one standard deviation (here, about 3.5 inches). (Under the

model, this factor is the same for all positions.) Form an approximate 95% central

posterior credible interval for this factor. According to your interval, does it seem that

greater height is associated with a higher rate of blocking shots?

(d) [2 pts] Use the chi-square discrepancy to compute an approximate posterior predictive

p-value. Does it indicate any evidence of problems?

(e) For each player (i), approximate Pr

y

rep

i ≥ yi

| y

, which is a kind of marginal posterior

predictive p-value.

(i) [2 pts] Show your R code, and display a table with the player names and their

values of this probability.

(ii) [1 pt] Name any players for whom this probability is less than 0.05. (Any such

player blocked notably more shots than the model would suggest, for his position

and height.)

(iii) [1 pt] Notice that the probability equals 1 for some players. Why is that actually

not surprising? (Hint: How many shots were actually blocked by those players?

How much playing time did they have?)

Total: 32 pts