37 MLM and Cluster-Robust Standard Errors

Author

Luke Miratrix

We have talked about multilevel modeling, and talked about cluster robust standard errors. We can actually have both. In STATA, you just write “, robust” in your modeling command. In R, you have to use the clubSandwich package. Let’s illustrate with the High School and Beyond data.

37.1 Robust standard errors without multilevel modeling

Before diving into MLM, one way to get classic Huber-White / Sandwich / Heteroskedastic-Robust Standard Errors for vanilla OLS for a single school (school 8857) is via the sandwich package.

First we fit our regression, like we would normally:

one.sch = filter( dat, id == "8857" )
nrow( one.sch )

[1] 64

M0 <- lm( mathach ~ 1 + ses, dat) 
arm::display( M0 )

lm(formula = mathach ~ 1 + ses, data = dat)
            coef.est coef.se
(Intercept) 12.75     0.08  
ses          3.18     0.10  
---
n = 7185, k = 2
residual sd = 6.42, R-Squared = 0.13

Then we use the sandwich and lmtest package:

library( sandwich )
library( lmtest )
lmtest::coeftest(M0, type = "HC1")


t test of coefficients:

             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 12.747396   0.075686 168.424 < 2.2e-16 ***
ses          3.183870   0.097121  32.782 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is how you would get cluster robust standard errors when you have clustered data, but have not bothered with multilevel modeling:

M1 = lm( mathach ~ 1 + ses + sector, data = dat )
lmtest::coeftest( M1, type = "CL", cluster = dat$id )


t test of coefficients:

             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 11.793254   0.106102 111.150 < 2.2e-16 ***
ses          2.948558   0.097831  30.139 < 2.2e-16 ***
sector       1.935013   0.152493  12.689 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As an alternative, the lm_robust method from the estimatr package does these steps in a single step, and is easier to use. But the above sets us up nicely for understanding how to add robustness to a MLM.

37.2 CRSE on top of Multilevel Modeling

Ok, so we have seen how to get robust standard errors in the above; how do we combine them with multilevel modeling? First, let’s fit our multilevel model:

M2 = lmer( mathach ~ 1 + ses + sector + (1|id), data=dat )
display( M2 )

lmer(formula = mathach ~ 1 + ses + sector + (1 | id), data = dat)
            coef.est coef.se
(Intercept) 11.72     0.23  
ses          2.37     0.11  
sector       2.10     0.34  

Error terms:
 Groups   Name        Std.Dev.
 id       (Intercept) 1.92    
 Residual             6.09    
---
number of obs: 7185, groups: id, 160
AIC = 46621.2, DIC = 46601.7
deviance = 46606.4

If we believe all our MLM assumptions, we can get our vanilla standard errors as so:

summary( M2 )$coef

             Estimate Std. Error   t value
(Intercept) 11.718908  0.2280585 51.385537
ses          2.374711  0.1054911 22.511017
sector       2.100837  0.3411243  6.158567

If we don’t believe them fully, we might want to make our inference more robust. Before we turn to this, first note that our point estimates can be different for MLM vs OLS:

coef( M1 )

(Intercept)         ses      sector 
  11.793254    2.948558    1.935013

fixef( M2 )

(Intercept)         ses      sector 
  11.718908    2.374711    2.100837

The assumption of the random effects means we are not weighting all our data the same way. For example, if we find the clusters vary in size a lot, we might weight the clusters more equally when estimating a cluster-level coefficient (e.g., sector) instead of counting on the big clusters more.

Regardless, we might worry that complex dependencies within our clusters are messing up our standard errors in our MLM. Fixing that is easy:

library( clubSandwich )
club <- coef_test( M2,
           vcov = "CR1S",
           test = "Satterthwaite")
club

Alternative hypothesis: two-sided 
       Coef. Estimate    SE Null value t-stat d.f. (Satt) p-val (Satt) Sig.
 (Intercept)    11.72 0.226          0  51.86        88.9       <0.001  ***
         ses     2.37 0.119          0  19.89       142.8       <0.001  ***
      sector     2.10 0.347          0   6.05       149.5       <0.001  ***

The clubSandwich package works for multi-level models fit with either lme4::lmer() or nlme::lme(). Note that coef_test is not the same as coeftest. The vcov = "CR1S" replicates the Stata SEs (or so it has been speculated, and assuming they use the same correction as for panel data models).

We can compare the SEs as so:

rbind( club$SE, se.fixef(M2) )

     (Intercept)       ses    sector
[1,]   0.2259713 0.1193704 0.3474632
[2,]   0.2280585 0.1054911 0.3411243

We see that the SEs did not change much as compared to the vanilla lmer call that assumes homoskedasticity and within-cluster independence of the residuals in this particular circumstance.

37.2.1 What misspecification should we worry about?

The sorts of misspecification that we might be worried about are things such as the following:

Using a random intercept model when the real data-generating process has a random slope;
Using a model that assumes homogeneous random effects when the real data-generating process involves heteroskedasticity (e.g., different random effects variances for treatment schools than for control schools);
Using a model with a single level of random effects (e.g., school random effects) when the real data-generating process has multiple levels of structure (e.g., school and classroom random effects); or
Assuming homoscedastic variance for the lowest-level errors when the real process is heteroskedastic or has some other structure.

Of the above (3) could be due to “secret clustering” in your clusters, and in principle result in radically incorrect standard errors. The other options are more violations of homoskedasticity, and are likely to not be as serious of concerns. You can diagnose, in principle, heteroskedasticity with residual plots, checking to see if you have more scatter in your data for some groups or individuals than others.

Regardless, if you are worried about these things, then the above will give you improved standard errors.

37.3 Some technical notes

So what is this thing even doing? In the following I describe a rough approximation. The key idea is that a multilevel model specification is specifying a parameterized \(n \times n\) variance-covariance matrix of the residuals of a generic linear model. Due to our assumption of independent clusters, this matrix is block diagonal with blocks \(V_1, \ldots, V_J\), with block \(V_j\) corresponding to group \(j\). For a random intercept model, for example, block \(j\) would be a \(n_j \times n_j\) matrix with \(\tau_{00} + \sigma^2\) for the diagonal and \(\tau_{00}\) for the off-diagonal, with \(\tau_{00}\) being the variance of the random intercepts and \(\sigma^2\) being the within-block residual variance.

If we write our multilevel model in reduced form, we can write it as a mini-regression for each group \(j\) of: \[ Y_j = X_j \vec{\beta} + Z_j \vec{r}_j + e_j , \] where \(Y_j\) is the vector of outcomes, \(X_j\) and \(Z_j\) are mini design matrices of covariates (including a column of 1s for the intercept, normally), with \(X_j\) being all the covariates and \(Z_j\) being those covariates that have corresponding random effects (also with a column of 1s), \(\vec{\beta}\) the vector of coefficients (the fixed effects), \(\vec{r}_j\) the vector of random effects for group \(j\), and \(e_j\) the vector of residuals.

Importantly, the \(u_j := Z_j \vec{u}_j + e_j\) is all residual, and \(V_j = Var( u_j )\): the variance-covariance matrix of the residuals is determined by this structure and our assumptions on \(\vec{u}_j\) being multivariate normal and the \(e_j\) being a vector of independent residual draws (the \(\epsilon_{ij}\)).

Now, given this view of our multilevel model, we can estimated this with generalized least squares. Generalized least squares is a generic regression technique where, if you have a parameterized covariance matrix on the residuals, you can estimate your regression coefficients taking that correlation structure into account. Think of it as a three-step process: first fit the regression without taking the residuals into account, then use the fit model to estimate our big \(n \times n\) variance-covariance matrix, and then use this estimated matrix as a set of weights that we plug back into a least squares estimation.

In particular, the estimator for \(\vec{\beta}\) is weighted least squares: \[ \hat{\beta} = (X'WX)^{-1}X'W Y , \] with \(W\) a weight matrix that is a \(n \times n\) block-diagonal matrix formed from the inverses of the estimated \(V_j\). Now if the random effects structure (the assumed distribution of the \(r_j\)) is misspecified or the residual error structure (on the \(\epsilon_{ij}\)) is wrong, then \(V_j\) will be wrong, but \(\hat{\beta}\) will still be asymptotically consistent (under some conditions).

Cluster-robust methods use the empirical residuals (the \(\hat{u}_{j}\)) to assess the uncertainty in \(\hat{\beta}\) as an estimate of the \(\beta\) as defined by the implied weights \(W\). Even if the random effects part of the model is wrong, the assumption of independent clusters means our inference on this estimand is still right. The key idea is cluster-robust methods take a weighted average of \(J\) very badly estimated variance-covariance matrices to get a decent estimate of overall population-level uncertainty.

The main advantage of the clubSandwich package is it will take our multilevel model and do this cluster-roboust standard error calculation. Even better, however, is it will (using “CR2” adjustment) try to improve the basic sandwich estimator by 1) adjusting the residuals (the \(\hat{u}_j\)) a bit so that the variance estimator is exactly unbiased if the working model is exactly correct and b) using Satterthwaite degrees of freedom (or generalizations thereof) for tests/confidence intervals, also derived under the assumption that the working model is exactly correct.

37.4 Acknowledgements

Thanks to James Pustejovsky, the creator of the clubSandwich package, for the help in thinking this through. Much of these notes, in particular the reasons for misspecification and much of the technical notes, are liberally stolen from emails with this fine colleague.