18 How Empirical Bayes over-shrinks

Author

Miratrix

Published

September 26, 2023

Using our high school and beyond dataset, we are going to fit a random slope model and then examine how the empirical bayes estimates operate.

We first fit the model:

library( lme4 )

M1 = lmer( mathach ~ 1 + ses + (1 + ses|id), data=dat )

display( M1 )

lmer(formula = mathach ~ 1 + ses + (1 + ses | id), data = dat)
            coef.est coef.se
(Intercept) 12.67     0.19  
ses          2.39     0.12  

Error terms:
 Groups   Name        Std.Dev. Corr  
 id       (Intercept) 2.20           
          ses         0.64     -0.11 
 Residual             6.07           
---
number of obs: 7185, groups: id, 160
AIC = 46652.4, DIC = 46632.5
deviance = 46636.5

We then can get Empirical Bayes estimates for all the random intercepts and slopes:

res = coef( M1 )$id
names( res ) = c( "beta0", "beta1" )
res <- rownames_to_column(res, "id")
head( res )

    id     beta0    beta1
1 1224 11.060022 2.504083
2 1288 13.073615 2.477930
3 1296  9.197291 2.352981
4 1308 14.384652 2.309294
5 1317 12.449142 2.237431
6 1358 11.520351 2.753447

Each row corresponds to a different school, and gives our estimated intercept and slope for that school. These estimates are shrunken towards the overall population. To illustrate, consider these three schools:

The dotted lines are if we just ran a regression for the data in that school. The thick black line is the overall population average line (averaged across all schools, from our MLM). The red line is the Empirical Bayes line–we are shrinking the dotted lines toward the thick black line, and we shrink depending on the amount of data, and how informative the data is, in each school. For example, school 3377 has a lot of shrinkage of slope, and a bit of intercept. School 9397 is basically unchanged. We see the slopes are getting shrunk much more than the intercepts–this is because we are less certain about the slopes; we shrink more for things we are uncertain about.

Remember our Radon and counties example: we shrunk small counties MORE than large counties, when estimating intercepts. We are now estimating the pair of intercept and slope, and how well we estimate the slope depends on amount of data, but also the spread of the data on the x-axis and a few other things. But the intuition is the same: everything is pulled towards the grand average line.

18.1 Comparing the model to the estimates

We can measure how much variation there is in the Empirical Bayes estimated intercepts and slopes, along with the correlation of these effects:

eb_ests = c( sd_int = sd( res$beta0 ),
             sd_slope = sd( res$beta1 ),
             cor = cor( res$beta0, res$beta1 ) )

We display these estimates alongside the model estimates:

parameter	model estimate	Emp Bayes estimate
stdev intercept	2.20	2.01
stdev slope	0.64	0.28
correlation	-0.11	-0.23

If we compare the variation in the empirical Bayes estimates to the model estimates, we see that the standard deviations are smaller and the correlation is estimated as larger in magnitude. Importantly, our model does a good job, in general, estimating how much variation in random intercepts and slopes there is; it is the empirical estimates that are over shrunk. Trust the model, not the spread of the empirical estimates.

In short, the empirical estimates are good for predicting individual values, but the distribution of the empirical estimates is generally too tidy and narrow, as compared to the truth. The model is what best estimates the population characteristics. That being said, the empirical Bayes estimates are far better than the raw estimates (in the above, for example, trust the red lines more than the dashed lines).

18.2 Plotting the individual schools

When looking at individual schools we have this:

ggplot( data=res ) +
    scale_x_continuous(limits=range( dat$ses ) ) +
    scale_y_continuous(limits=range( dat$mathach ) ) +
  geom_abline( aes( intercept = beta0, slope=beta1 ), alpha=0.25) +
  labs( x="SES", y="Math Achievment" ) +
  theme_minimal()

Compare that to the messy (and incorrect) raw estimates, that are generated by running a interacted fixed effect regression of:

M = lm( mathach ~ 0 + ses*id - ses, data=dat )
cc = coef(M)
head(cc)

   id1224    id1288    id1296    id1308    id1317    id1358 
10.805132 13.114937  8.093779 16.188959 12.737763 11.305904

tail(cc)

ses:id9347 ses:id9359 ses:id9397 ses:id9508 ses:id9550 ses:id9586 
  2.685994  -0.833479   2.446444   3.953791   3.891938   1.672081

length(cc)

[1] 320

schools = tibble( beta0 = cc[1:160],
                  beta1 = cc[161:320] )

ggplot( data=schools ) +
    scale_x_continuous(limits=range( dat$ses ) ) +
    scale_y_continuous(limits=range( dat$mathach ) ) +
    geom_abline( aes( intercept = beta0, slope=beta1 ), alpha=0.25) +
    labs( x="SES", y="Math Achievment" ) +
  theme_minimal()

The raw estimates are over dispersed; the measurement error is giving a bad picture.

18.3 Simulation to get a just-right picture

As discussed in class, empirical Bayes is too smooth. Raw is too disperse. If we want to see a picture of what the population of schools might look like, we can make a plot of 160 NEW schools generated from our model (to see how our partially pooled (Empirical Bayes) estimates are OVER SHRUNK/OVER SMOOTHED).

We simulate from our model; we are not using the empirical bayes estimates at all. See the slides and script for Packet 2.4 for how to do this simulation.