36  A tour of fixed effects and cluster-robust SEs

Author

Luke Miratrix

This brief handout will walk through fixed effects and cluster robust standard errors, with a stop at aggregation and heteroskedastic standard errors. We do regression using lm_robust() from the estimatr package.

Consider the following research question as our motiviating question:

RQ: Are there differences in math achievement for Catholic vs public schools, controlling for differences in SES?

36.1 Aggregation

One way forward is to aggregate our HS&B data and merge it into our school-level data, and then analyze the result.

We aggregate as so:

col.dat = dat %>% group_by( id ) %>% 
    summarize( per.fem = mean(female),
                 per.min = mean(minority),
                 mean.ses = mean(ses),
                 mean.ach = mean(mathach),
                 n.stud = n() )

# combine our school-level variables (ours and theirs) into one data.frame
sdat = merge( sdat, col.dat, by="id", all=TRUE )
head( sdat )
    id size sector pracad disclim himinty meanses   per.fem    per.min
1 1224  842      0   0.35   1.597       0  -0.428 0.5957447 0.08510638
2 1288 1855      0   0.27   0.174       0   0.128 0.4400000 0.12000000
3 1296 1719      0   0.32  -0.137       1  -0.420 0.6458333 0.97916667
4 1308  716      1   0.96  -0.622       0   0.534 0.0000000 0.40000000
5 1317  455      1   0.95  -1.694       1   0.351 1.0000000 0.72916667
6 1358 1430      0   0.25   1.535       0  -0.014 0.3666667 0.10000000
     mean.ses  mean.ach n.stud
1 -0.43438298  9.715447     47
2  0.12160000 13.510800     25
3 -0.42550000  7.635958     48
4  0.52800000 16.255500     20
5  0.34533333 13.177687     48
6 -0.01966667 11.206233     30

We can now answer our research question with a school-level regression with lm_robust(), that calculates heteroskedastic-robust standard errors:

library( estimatr )
Magg = lm_robust( mean.ach ~ 1 + sector + mean.ses, data=sdat )

tidy( Magg )
         term  estimate std.error statistic       p.value   conf.low conf.high
1 (Intercept) 12.119496 0.1890070 64.121943 1.628760e-114 11.7461711 12.492820
2      sector  1.221944 0.3226998  3.786627  2.170049e-04  0.5845507  1.859337
3    mean.ses  5.387377 0.3423555 15.736205  4.289753e-34  4.7111599  6.063594
   df  outcome
1 157 mean.ach
2 157 mean.ach
3 157 mean.ach

The lm_robust() method gives heteroskedastic robust standard errors that take into account possible heteroskedasticity due to, for example, some school outcomes being based on smaller numbers of students (and thus having more variation) than other school outcomes.

In this regression we are controlling for school mean SES, not student SES. If anything is going on within school between SES and math achievement, in a way that could be different for different sectors, we might be missing it.

36.2 Student level regression and cluster robust SES

Instead of using our aggregated data, we can merge our school-level variables into the student data and run a student level regression:

The merge brings in level 2 variables, repeating them for each student in a school:

dat = merge( dat, sdat, by="id" )
head( dat )
    id minority female    ses mathach size sector pracad disclim himinty
1 1224        0      1 -1.528   5.876  842      0   0.35   1.597       0
2 1224        0      1 -0.588  19.708  842      0   0.35   1.597       0
3 1224        0      0 -0.528  20.349  842      0   0.35   1.597       0
4 1224        0      0 -0.668   8.781  842      0   0.35   1.597       0
5 1224        0      0 -0.158  17.898  842      0   0.35   1.597       0
6 1224        0      0  0.022   4.583  842      0   0.35   1.597       0
  meanses   per.fem    per.min  mean.ses mean.ach n.stud
1  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
2  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
3  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
4  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
5  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
6  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47

And now our regression (without handling our clustering, so this is giving us wrong standard errors):

Mstud = lm( mathach ~ 1 + sector + ses, data = dat )
summary( Mstud )

Call:
lm(formula = mathach ~ 1 + sector + ses, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.1339  -4.6785   0.1504   4.8894  16.5847 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.79325    0.10610  111.15   <2e-16 ***
sector       1.93501    0.15249   12.69   <2e-16 ***
ses          2.94856    0.09783   30.14   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.345 on 7182 degrees of freedom
Multiple R-squared:  0.1492,    Adjusted R-squared:  0.149 
F-statistic: 629.8 on 2 and 7182 DF,  p-value: < 2.2e-16

36.2.1 Getting the right standard errors

The standard errors for the above regression, however, is wrong: we are not taking the clustering into account. We can fix this with cluster-robust standard errors. The lm_robust() method comes to the rescue:

Mstud <- lm_robust( mathach ~ 1 + sector + ses, 
                       data = dat,
                       clusters = dat$id )
summary( Mstud )

Call:
lm_robust(formula = mathach ~ 1 + sector + ses, data = dat, clusters = dat$id)

Standard error type:  CR2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper     DF
(Intercept)   11.793     0.2038  57.854 1.608e-69   11.388   12.199  84.12
sector         1.935     0.3185   6.076 1.082e-08    1.305    2.565 141.46
ses            2.949     0.1285  22.951 4.478e-48    2.694    3.203 132.91

Multiple R-squared:  0.1492 ,   Adjusted R-squared:  0.149 
F-statistic: 351.6 on 2 and 159 DF,  p-value: < 2.2e-16

We specify the clustering and lm_robust() does the rest; note that we would normally not even run the original lm() command. This replaces it.

For our research question, we see that Catholic schools score 2 points higher than Public, on average, beyond individual level SES.

We can further control for school mean SES, like with aggregation:

Mstud2 = lm_robust( mathach ~ 1 + sector + ses + meanses, 
                    data = dat,
                    cluster = dat$id )
summary( Mstud2 )

Call:
lm_robust(formula = mathach ~ 1 + sector + ses + meanses, data = dat, 
    clusters = dat$id)

Standard error type:  CR2 

Coefficients:
            Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper     DF
(Intercept)   12.098     0.1712  70.654 6.309e-75  11.7573   12.439  81.55
sector         1.280     0.3022   4.237 5.240e-05   0.6804    1.880  95.16
ses            2.191     0.1299  16.874 1.246e-35   1.9344    2.448 140.83
meanses        2.973     0.3683   8.072 8.367e-12   2.2391    3.706  75.50

Multiple R-squared:   0.17 ,    Adjusted R-squared:  0.1696 
F-statistic: 246.2 on 3 and 159 DF,  p-value: < 2.2e-16

The contextual value of school mean SES is explaining some of the difference between Catholic and public schools, here: note the reduction of the coefficient for sector. That being said, and still accounting for clustering, sector is still quite significant. The lm_robust() function is also giving us confidence intevals, which is nice: we see anything between 0.7 and 1.9 is possible.

Relative to the overall standard deviation of math achievement we have:

sd_math = sd( dat$mathach )
sd_math
[1] 6.878246
confint(Mstud2, vcov. = vcovCL, cluster = dat$id ) / sd_math
                 2.5 %    97.5 %
(Intercept) 1.70934019 1.8083934
sector      0.09892687 0.2733738
ses         0.28124044 0.3558869
meanses     0.32553097 0.5388374

The difference between Catholic and public schools is somewhere between 0.10 and 0.27 standard deviations, beyond what can be explained by ses. This is a fairly sizable effect, in education.

36.3 And fixed effects?

We can combine fixed effects and cluster robust standard errors quite easily, but we cannot combine fixed effects and level 2 covariates at all. We next look at this latter problem, and then see what combining these options looks like when asking questions that do not rely on level 2 variables for main effects.

36.3.1 The problem of fixed effects and dummy variables

Fixed effects cannot be used to take into account school differences if we are interested in level 2 variables, because the fixed effects and level 2 variables are co-linear. Put another way, if we let each school have its own mean outcome (represented by the coefficient for a dummy variable for that school), then we can’t have a variable like sector to measure how Catholic schools are different from public schools, conditioned on all the school mean outcomes. There is nothing left to explain as, by construction, there are no differences in school mean outcomes once we “control for” the individual school mean outcomes via fixed effects!

What R will do when you give colinear variables is drop the extra ones. Here is a mini-example fake dataset of 4 schools with 3 students in each school:

fake = tibble( id = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4),
                   mathach = rnorm( 12 ),
                   ses = rnorm( 12 ),
                   sector = c( 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1 ) )
fake$id = as.factor( fake$id )
fake
# A tibble: 12 × 4
   id    mathach      ses sector
   <fct>   <dbl>    <dbl>  <dbl>
 1 1     -0.388   0.0882       0
 2 1      0.479  -0.0736       0
 3 1      0.0181 -0.0327       0
 4 2      0.881   1.34         1
 5 2      0.356  -0.00986      1
 6 2      0.699   0.679        1
 7 3     -1.71   -0.352        0
 8 3     -0.203  -0.377        0
 9 3      0.432   0.338        0
10 4     -0.480  -0.0240       1
11 4     -0.261  -1.44         1
12 4      1.06    0.722        1

And our regression model with fixed effects for school plus our school-level ses gives this:

lm( mathach ~ 0 + id + ses + sector, data = fake )

Call:
lm(formula = mathach ~ 0 + id + ses + sector, data = fake)

Coefficients:
     id1       id2       id3       id4       ses    sector  
 0.04004   0.25226  -0.41593   0.25026   0.58576        NA  

Note the NA for sector! We cannot estimate it due to colinearity, so it got dropped.

36.3.2 Using fixed effects to handle clustering

That being said, fixed effects are an excellent way to control for school differences when looking at within-school relationships. For example, we can ask how math relates to SES within schools, controlling for systematic differences across schools.

Here is the no fixed effect regression, and the fixed effect regression:

Mstud3_noFE = lm( mathach ~ 1 + ses, data=dat )

dat$id = as.factor(dat$id)
Mstud3 = lm( mathach ~ 0 + ses + id, data=dat )
head( coef( Mstud3 ) )
      ses    id1224    id1288    id1296    id1308    id1317 
 2.191172 10.667255 13.244353  8.568302 15.098561 12.421003 

For our fixed effect model, we will have lots of coefficients because we have a fixed effect for each school; the head() command is just showing us the first few. We also had to explicitly make our id variable a factor (categorical variable), so R doesn’t think it is a continuous covariate.

For our standard errors, etc., we can further account for clustering of our residuals above and beyond what can be explained by our fixed effects (even if we subtract out the mean outcome, we might still have dependencies between students within a given school). So we use our cluster-robust standard errors as so:

Mstud3_rob <- lm_robust( mathach ~ 0 + ses + id, 
                         data=dat,
                         cluster = dat$id )
head( tidy( Mstud3_rob ) )
    term  estimate  std.error statistic       p.value  conf.low conf.high
1    ses  2.191172 0.12984948  16.87471  1.239339e-35  1.934466  2.447878
2 id1224 10.667255 0.05640441 189.12095 2.389336e-171 10.555746 10.778763
3 id1288 13.244353 0.01578970 838.79718 2.446556e-262 13.213138 13.275569
4 id1296  8.568302 0.05525096 155.07971 2.866668e-159  8.459073  8.677531
5 id1308 15.098561 0.06856053 220.22236 1.252769e-180 14.963020 15.234102
6 id1317 12.421003 0.04484136 276.99883 1.263605e-194 12.332354 12.509652
        df outcome
1 140.8263 mathach
2 140.8263 mathach
3 140.8263 mathach
4 140.8263 mathach
5 140.8263 mathach
6 140.8263 mathach

We have again used head() to just get the first lines. The whole printout would be one line per school, plus the ses coefficient!

Let’s compare our three models (note the way we omit coefficients with id to drop our fixed effects from the table):

library( texreg )
screenreg( list( `No FE`=Mstud3_noFE, `FE` = Mstud3, `FE + CRVE`=Mstud3_rob ), 
           omit.coef="id",
           include.ci = FALSE )

==================================================
             No FE        FE           FE + CRVE  
--------------------------------------------------
(Intercept)    12.75 ***                          
               (0.08)                             
ses             3.18 ***     2.19 ***     2.19 ***
               (0.10)       (0.11)       (0.13)   
--------------------------------------------------
R^2             0.13         0.83         0.83    
Adj. R^2        0.13         0.82         0.82    
Num. obs.    7185         7185         7185       
RMSE                                      6.08    
N Clusters                              160       
==================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

A few things to note:

  • Not having fixed effects means we are getting an estimate of the math-ses relationship including school level context. Note the higher point estimate. Often we want to focus on within-school relationships. Fixed effects does this.
  • The standard errors are larger once we include fixed effects; the fixed effects are partially accounting for clustering.
  • The standard errors are even larger when we include CRVE. It is more fully accounting for the clustering, and the fact that the clusters themselves could vary. In general, one should typically use CRVE in addition to fixed effects, if one wants to view the clusters as representative of a larger population (in this case a larger population of schools).

36.3.3 Bonus: Interactions with level-2 variables are OK, even with fixed effects

If we want to see if the relationship of math and SES is different between schools, we can get tricky like so:

Mstud4 = lm_robust( mathach ~ 0 + ses + ses:sector + id, 
                    data=dat,
                    cluster = id )
head( coef( Mstud4 ) )
      ses    id1224    id1288    id1296    id1308    id1317 
 2.782105 10.923946 13.172496  8.819744 15.498595 12.682641 
tail( coef( Mstud4 ) )
    id9359     id9397     id9508     id9550     id9586 ses:sector 
 14.763044   9.982311  13.772485  10.941590  13.973252  -1.348572 

Note interaction terms always get pushed to the end of the list of estimates by R. So we have to pull them out with tail().

In the following we compare SEs to if we hadn’t used cluster robust SEs.

a <- lm( mathach ~ 0 + ses + ses:sector + id, 
                    data=dat )
screenreg( list( wrong=a, adjusted=Mstud4 ), 
           omit.coef="id", single.row = TRUE,
           include.ci=FALSE )

==================================================
            wrong               adjusted          
--------------------------------------------------
ses            2.78 (0.14) ***     2.78 (0.16) ***
ses:sector    -1.35 (0.22) ***    -1.35 (0.23) ***
--------------------------------------------------
R^2            0.83                0.83           
Adj. R^2       0.82                0.82           
Num. obs.   7185                7185              
RMSE                               6.07           
N Clusters                       160              
==================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

In our second column we are accounting for our clustering with our cluster robust SEs.