36 A tour of fixed effects and cluster-robust SEs

Author

Luke Miratrix

This brief handout will walk through fixed effects and cluster robust standard errors, with a stop at aggregation and heteroskedastic standard errors. We do regression using lm_robust() from the estimatr package.

Consider the following research question as our motiviating question:

RQ: Are there differences in math achievement for Catholic vs public schools, controlling for differences in SES?

36.1 Aggregation

One way forward is to aggregate our HS&B data and merge it into our school-level data, and then analyze the result.

We aggregate as so:

col.dat = dat %>% group_by( id ) %>% 
    summarize( per.fem = mean(female),
                 per.min = mean(minority),
                 mean.ses = mean(ses),
                 mean.ach = mean(mathach),
                 n.stud = n() )

# combine our school-level variables (ours and theirs) into one data.frame
sdat = merge( sdat, col.dat, by="id", all=TRUE )
head( sdat )

    id size sector pracad disclim himinty meanses   per.fem    per.min
1 1224  842      0   0.35   1.597       0  -0.428 0.5957447 0.08510638
2 1288 1855      0   0.27   0.174       0   0.128 0.4400000 0.12000000
3 1296 1719      0   0.32  -0.137       1  -0.420 0.6458333 0.97916667
4 1308  716      1   0.96  -0.622       0   0.534 0.0000000 0.40000000
5 1317  455      1   0.95  -1.694       1   0.351 1.0000000 0.72916667
6 1358 1430      0   0.25   1.535       0  -0.014 0.3666667 0.10000000
     mean.ses  mean.ach n.stud
1 -0.43438298  9.715447     47
2  0.12160000 13.510800     25
3 -0.42550000  7.635958     48
4  0.52800000 16.255500     20
5  0.34533333 13.177687     48
6 -0.01966667 11.206233     30

We can now answer our research question with a school-level regression with lm_robust(), that calculates heteroskedastic-robust standard errors:

library( estimatr )
Magg = lm_robust( mean.ach ~ 1 + sector + mean.ses, data=sdat )

tidy( Magg )

         term  estimate std.error statistic       p.value   conf.low conf.high
1 (Intercept) 12.119496 0.1890070 64.121943 1.628760e-114 11.7461711 12.492820
2      sector  1.221944 0.3226998  3.786627  2.170049e-04  0.5845507  1.859337
3    mean.ses  5.387377 0.3423555 15.736205  4.289753e-34  4.7111599  6.063594
   df  outcome
1 157 mean.ach
2 157 mean.ach
3 157 mean.ach

The lm_robust() method gives heteroskedastic robust standard errors that take into account possible heteroskedasticity due to, for example, some school outcomes being based on smaller numbers of students (and thus having more variation) than other school outcomes.

In this regression we are controlling for school mean SES, not student SES. If anything is going on within school between SES and math achievement, in a way that could be different for different sectors, we might be missing it.

36.2 Cluster Robust Standard Errors

Instead of using our aggregated data, we can merge our school-level variables into the student data and run a student level regression:

The merge brings in level 2 variables, repeating them for each student in a school:

dat = merge( dat, sdat, by="id" )
head( dat )

    id minority female    ses mathach size sector pracad disclim himinty
1 1224        0      1 -1.528   5.876  842      0   0.35   1.597       0
2 1224        0      1 -0.588  19.708  842      0   0.35   1.597       0
3 1224        0      0 -0.528  20.349  842      0   0.35   1.597       0
4 1224        0      0 -0.668   8.781  842      0   0.35   1.597       0
5 1224        0      0 -0.158  17.898  842      0   0.35   1.597       0
6 1224        0      0  0.022   4.583  842      0   0.35   1.597       0
  meanses   per.fem    per.min  mean.ses mean.ach n.stud
1  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
2  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
3  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
4  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
5  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47
6  -0.428 0.5957447 0.08510638 -0.434383 9.715447     47

If we run our regression without handling our clustering, we get fine point estimates, but our standard errors are wrong:

Mstud = lm( mathach ~ 1 + sector + ses, data = dat )
broom::tidy( Mstud ) %>% 
  knitr::kable( digits=2 )

term	estimate	std.error	statistic
(Intercept)	11.79	0.11	111.15
sector	1.94	0.15	12.69
ses	2.95	0.10	30.14

The standard errors for the above regression, however, is wrong: we are not taking the clustering into account. We can fix this with cluster-robust standard errors. The lm_robust() method comes to the rescue:

Mstud <- lm_robust( mathach ~ 1 + sector + ses, 
                       data = dat,
                       clusters = dat$id )
broom::tidy( Mstud ) %>% 
  knitr::kable( digits=2 )

term	estimate	std.error	statistic	conf.low	conf.high	df	outcome
(Intercept)	11.79	0.20	57.85	11.39	12.20	84.12	mathach
sector	1.94	0.32	6.08	1.31	2.56	141.46	mathach
ses	2.95	0.13	22.95	2.69	3.20	132.91	mathach

We specify the clustering and lm_robust() does the rest; note that we would normally not even run the original lm() command. The lm_robust() command replaces it.

For our research question, we see that Catholic schools score about 2 points higher than Public, on average, beyond individual level SES.

We can further control for school mean SES, like with aggregation:

Mstud2 = lm_robust( mathach ~ 1 + sector + ses + meanses, 
                    data = dat,
                    cluster = dat$id )
rs <- broom::tidy( Mstud2 ) 
rs %>% 
  knitr::kable( digits=2 )

term	estimate	std.error	statistic	conf.low	conf.high	df	outcome
(Intercept)	12.10	0.17	70.65	11.76	12.44	81.55	mathach
sector	1.28	0.30	4.24	0.68	1.88	95.16	mathach
ses	2.19	0.13	16.87	1.93	2.45	140.83	mathach
meanses	2.97	0.37	8.07	2.24	3.71	75.50	mathach

The contextual value of school mean SES is explaining some of the difference between Catholic and public schools, here: note the reduction of the coefficient for sector. That being said, and still accounting for clustering, sector is still quite significant. The lm_robust() function is also giving us confidence intevals, which is nice: we see anything between 0.7 and 1.9 is possible.

Relative to the overall standard deviation of math achievement we have:

sd_math = sd( dat$mathach )
sd_math

[1] 6.878246

rs %>%
  mutate( estimate = estimate / sd_math,
          std.error = std.error / sd_math,
          conf.low = conf.low / sd_math,
          conf.high = conf.high / sd_math ) %>%
  knitr::kable( digits = 2 )

term	estimate	std.error	statistic	conf.low	conf.high	df	outcome
(Intercept)	1.76	0.02	70.65	1.71	1.81	81.55	mathach
sector	0.19	0.04	4.24	0.10	0.27	95.16	mathach
ses	0.32	0.02	16.87	0.28	0.36	140.83	mathach
meanses	0.43	0.05	8.07	0.33	0.54	75.50	mathach

(We have rescaled all our estimates by the standard deviation, which puts things into effect size units, i.e., how many standard deviations large everything is.) We now see the difference between Catholic and public schools is somewhere between 0.10 and 0.27 standard deviations, beyond what can be explained by ses. This is a fairly sizable effect, in education.

36.3 And fixed effects?

We can combine fixed effects and cluster robust standard errors quite easily, but we cannot combine fixed effects and level 2 covariates at all. We next look at this latter problem, and then see what combining these options looks like when asking questions that do not rely on level 2 variables for main effects.

36.3.1 The problem of fixed effects and level-2 variables

Fixed effects cannot be used to take into account school differences if we are interested in level 2 variables, because the fixed effects and level 2 variables are co-linear. Put another way, if we let each school have its own mean outcome (represented by the coefficient for a dummy variable for that school), then we can’t have a variable like sector to measure how Catholic schools are different from public schools, conditioned on all the school mean outcomes. There is nothing left to explain as, by construction, there are no differences in school mean outcomes once we “control for” the individual school mean outcomes via fixed effects!

What R will do when you give colinear variables is drop the extra ones. Here is a mini-example fake dataset of 4 schools with 3 students in each school:

fake = tibble( id = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4),
                   mathach = rnorm( 12 ),
                   ses = rnorm( 12 ),
                   sector = c( 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1 ) )
fake$id = as.factor( fake$id )
fake

# A tibble: 12 × 4
   id    mathach     ses sector
   <fct>   <dbl>   <dbl>  <dbl>
 1 1      0.238  -1.88        0
 2 1     -1.00   -1.07        0
 3 1     -0.253  -1.11        0
 4 2      1.39   -1.53        1
 5 2     -0.297  -0.423       1
 6 2      1.49   -1.34        1
 7 3      0.829  -0.970       0
 8 3      1.70    2.04        0
 9 3      0.183  -0.402       0
10 4     -0.894   0.970       1
11 4      0.0481  0.0694      1
12 4      0.211   2.23        1

And our regression model with fixed effects for school plus our school-level ses gives this:

lm( mathach ~ 0 + id + ses + sector, data = fake )


Call:
lm(formula = mathach ~ 0 + id + ses + sector, data = fake)

Coefficients:
     id1       id2       id3       id4       ses    sector  
-0.23608   0.94439   0.88686  -0.29404   0.07542        NA

Note the NA for sector! We cannot estimate it due to colinearity, so it got dropped.

36.3.2 Fixed effects can handle clustering

That being said, fixed effects are an excellent way to control for school differences when looking at within-school relationships. For example, we can ask how math relates to SES within schools, controlling for systematic differences across schools.

Here is the no fixed effect regression, and the fixed effect regression:

Mstud3_noFE = lm( mathach ~ 1 + ses, data=dat )

dat$id = as.factor(dat$id)
Mstud3 = lm( mathach ~ 0 + ses + id, data=dat )
head( coef( Mstud3 ) )

      ses    id1224    id1288    id1296    id1308    id1317 
 2.191172 10.667255 13.244353  8.568302 15.098561 12.421003

For our fixed effect model, we will have lots of coefficients because we have a fixed effect for each school; the head() command is just showing us the first few. We also had to explicitly make our id variable a factor (categorical variable), so R doesn’t think it is a continuous covariate.

For our standard errors, etc., we can further account for clustering of our residuals above and beyond what can be explained by our fixed effects (even if we subtract out the mean outcome, we might still have dependencies between students within a given school). So we use our cluster-robust standard errors as so:

Mstud3_rob <- lm_robust( mathach ~ 0 + ses + id, 
                         data=dat,
                         cluster = dat$id )
head( tidy( Mstud3_rob ) )

    term  estimate  std.error statistic       p.value  conf.low conf.high
1    ses  2.191172 0.12984948  16.87471  1.239339e-35  1.934466  2.447878
2 id1224 10.667255 0.05640441 189.12095 2.389336e-171 10.555746 10.778763
3 id1288 13.244353 0.01578970 838.79718 2.446556e-262 13.213138 13.275569
4 id1296  8.568302 0.05525096 155.07971 2.866668e-159  8.459073  8.677531
5 id1308 15.098561 0.06856053 220.22236 1.252769e-180 14.963020 15.234102
6 id1317 12.421003 0.04484136 276.99883 1.263605e-194 12.332354 12.509652
        df outcome
1 140.8263 mathach
2 140.8263 mathach
3 140.8263 mathach
4 140.8263 mathach
5 140.8263 mathach
6 140.8263 mathach

We have again used head() to just get the first lines. The whole printout would be one line per school, plus the ses coefficient!

Let’s compare our three models (note the way we omit coefficients with id to drop our fixed effects from the table):

library( texreg )
screenreg( list( `No FE`=Mstud3_noFE, `FE` = Mstud3, `FE + CRVE`=Mstud3_rob ), 
           omit.coef="id",
           include.ci = FALSE )


==================================================
             No FE        FE           FE + CRVE  
--------------------------------------------------
(Intercept)    12.75 ***                          
               (0.08)                             
ses             3.18 ***     2.19 ***     2.19 ***
               (0.10)       (0.11)       (0.13)   
--------------------------------------------------
R^2             0.13         0.83         0.83    
Adj. R^2        0.13         0.82         0.82    
Num. obs.    7185         7185         7185       
RMSE                                      6.08    
N Clusters                              160       
==================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

A few things to note:

Not having fixed effects means we are getting an estimate of the math-ses relationship including school level context. Note the higher point estimate. Often we want to focus on within-school relationships. Fixed effects does this.
The standard errors are larger once we include fixed effects; the fixed effects are partially accounting for clustering.
The standard errors are even larger when we include CRVE. It is more fully accounting for the clustering, and the fact that the clusters themselves could vary. In general, one should typically use CRVE in addition to fixed effects, if one wants to view the clusters as representative of a larger population (in this case a larger population of schools).

36.3.3 Bonus: Interactions with level-2 variables are OK, even with fixed effects

If we want to see if the relationship of math and SES is different between schools, we can get tricky like so:

Mstud4 = lm_robust( mathach ~ 0 + ses + ses:sector + id, 
                    data=dat,
                    cluster = id )
head( coef( Mstud4 ) )

      ses    id1224    id1288    id1296    id1308    id1317 
 2.782105 10.923946 13.172496  8.819744 15.498595 12.682641

tail( coef( Mstud4 ) )

    id9359     id9397     id9508     id9550     id9586 ses:sector 
 14.763044   9.982311  13.772485  10.941590  13.973252  -1.348572

Note interaction terms always get pushed to the end of the list of estimates by R. So we have to pull them out with tail().

In the following we compare SEs to if we hadn’t used cluster robust SEs.

a <- lm( mathach ~ 0 + ses + ses:sector + id, 
                    data=dat )
screenreg( list( wrong=a, adjusted=Mstud4 ), 
           omit.coef="id", single.row = TRUE,
           include.ci=FALSE )


==================================================
            wrong               adjusted          
--------------------------------------------------
ses            2.78 (0.14) ***     2.78 (0.16) ***
ses:sector    -1.35 (0.22) ***    -1.35 (0.23) ***
--------------------------------------------------
R^2            0.83                0.83           
Adj. R^2       0.82                0.82           
Num. obs.   7185                7185              
RMSE                               6.07           
N Clusters                       160              
==================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

In our second column we are accounting for our clustering with our cluster robust SEs.

36.4 Fixed effects vs. cluster robust SEs

When running a regression with fixed effects and cluster-robust SEs, we might wonder when to use one vs. another, and when to use both. Here’s a breakdown of when to use each:

36.4.1 Fixed Effects

Use fixed effects when you want to control for unobserved variables that vary across groups (e.g., states, countries) but are constant over time or within those groups. Fixed effects helps eliminate bias from omitted variables that are group-specific and time-invariant.

Example: You are analyzing the effect of tuition fees on graduation rates, but there are unobserved factors (like state policies) that may affect both tuition and graduation rates.
When to use: If you believe that there are unobserved group-level characteristics that need to be controlled for.

Importantly, fixed-effects means you are estimating within group effects: you are no longer comparing one group to another.

Fixed effects by themselves can increase the plausibility of the residual independence assumption within groups. Without FEs (and no cluster-robust SEs) your SEs could be off as they are not accounting for the correlation of units within each group. FEs makes it more easy to believe your individual units are independent. So, roughly speaking, in many cases including fixed effects not only removes bias but also fixes your independence assumption for clustered data!

36.4.2 Cluster-Robust Standard Errors

Cluster-robust standard errors are used when you believe that observations within the same group (e.g., individuals within a state or students within a school) may be correlated. This method adjusts standard errors to account for potential intra-group correlation, ensuring more reliable inference.

Example: If students within the same community college may have correlated outcomes due to shared environments.
When to use: When there may be correlation in the error terms within groups, which could lead to underestimated standard errors.

But we just said fixed effects does this! CRSEs do this in a more robust way, making virtually no assumption on how units within groups might co-vary. But then why would we use both?

36.4.3 Using Both

If fixed effects account for clustering in your SEs, why bother with cluster-robust standard errors? That is an interesting question. Use both fixed effects and cluster-robust standard errors when:

You want to control for unobserved, time-invariant group-level factors with fixed effects.
You also suspect that there’s within-group correlation in the residuals even when pulling out the common fixed effect. This could be if you had further clustering within the cluster, for example (e.g., your school data was made by sampling a few classes from within the school). If that were happening, your SEs could be biased even when you include fixed effects.

Example: In a model of student performance across community colleges in different states, you may use fixed effects to control for state-level policies and cluster-robust standard errors to account for possible correlations between students in the same college.

There is another reason you might include CRSEs in your fixed effect model: you view your clusters as a sample from some larger population, and you want to get uncertainty estimates that include the question of whether the clusters in your data are representative of this larger population.

Fixed effects only just targets your evaluation sample (the data you have) and holds the clusters as fixed: you are estimating trends for those clusters in your data, and no further. CRSEs will assess cluster variation, and then give you SEs that include how that variation might make you more uncertain as to what you would find if you collected more clusters like the clusters you have.