This brief handout will walk through fixed effects and cluster robust standard errors, with a stop at aggregation and heteroskedastic standard errors. We do regression using lm_robust() from the estimatr package.
Consider the following research question as our motiviating question:
RQ: Are there differences in math achievement for Catholic vs public schools, controlling for differences in SES?
37.1 Aggregation
One way forward is to aggregate our HS&B data and merge it into our school-level data, and then analyze the result.
We aggregate as so:
col.dat = dat %>%group_by( id ) %>%summarize( per.fem =mean(female),per.min =mean(minority),mean.ses =mean(ses),mean.ach =mean(mathach),n.stud =n() )# combine our school-level variables (ours and theirs) into one data.framesdat =merge( sdat, col.dat, by="id", all=TRUE )head( sdat )
The lm_robust() method gives heteroskedastic robust standard errors that take into account possible heteroskedasticity due to, for example, some school outcomes being based on smaller numbers of students (and thus having more variation) than other school outcomes.
In this regression we are controlling for school mean SES, not student SES. If anything is going on within school between SES and math achievement, in a way that could be different for different sectors, we might be missing it.
37.2 Cluster Robust Standard Errors
Instead of using our aggregated data, we can merge our school-level variables into the student data and run a student level regression:
The merge brings in level 2 variables, repeating them for each student in a school:
If we run our regression without handling our clustering, we get fine point estimates, but our standard errors are wrong:
Mstud =lm( mathach ~1+ sector + ses, data = dat )broom::tidy( Mstud ) %>% knitr::kable( digits=2 )
term
estimate
std.error
statistic
p.value
(Intercept)
11.79
0.11
111.15
0
sector
1.94
0.15
12.69
0
ses
2.95
0.10
30.14
0
The standard errors for the above regression, however, is wrong: we are not taking the clustering into account. We can fix this with cluster-robust standard errors. The lm_robust() method comes to the rescue:
We specify the clustering and lm_robust() does the rest; note that we would normally not even run the original lm() command. The lm_robust() command replaces it.
For our research question, we see that Catholic schools score about 2 points higher than Public, on average, beyond individual level SES.
We can further control for school mean SES, like with aggregation:
The contextual value of school mean SES is explaining some of the difference between Catholic and public schools, here: note the reduction of the coefficient for sector. That being said, and still accounting for clustering, sector is still quite significant. The lm_robust() function is also giving us confidence intevals, which is nice: we see anything between 0.7 and 1.9 is possible.
Relative to the overall standard deviation of math achievement we have:
(We have rescaled all our estimates by the standard deviation, which puts things into effect size units, i.e., how many standard deviations large everything is.) We now see the difference between Catholic and public schools is somewhere between 0.10 and 0.27 standard deviations, beyond what can be explained by ses. This is a fairly sizable effect, in education.
37.3 And fixed effects?
We can combine fixed effects and cluster robust standard errors quite easily, but we cannot combine fixed effects and level 2 covariates at all. We next look at this latter problem, and then see what combining these options looks like when asking questions that do not rely on level 2 variables for main effects.
37.3.1 The problem of fixed effects and level-2 variables
Fixed effects cannot be used to take into account school differences if we are interested in level 2 variables, because the fixed effects and level 2 variables are co-linear. Put another way, if we let each school have its own mean outcome (represented by the coefficient for a dummy variable for that school), then we can’t have a variable like sector to measure how Catholic schools are different from public schools, conditioned on all the school mean outcomes. There is nothing left to explain as, by construction, there are no differences in school mean outcomes once we “control for” the individual school mean outcomes via fixed effects!
What R will do when you give colinear variables is drop the extra ones. Here is a mini-example fake dataset of 4 schools with 3 students in each school:
And our regression model with fixed effects for school plus our school-level ses gives this:
lm( mathach ~0+ id + ses + sector, data = fake )
Call:
lm(formula = mathach ~ 0 + id + ses + sector, data = fake)
Coefficients:
id1 id2 id3 id4 ses sector
0.64933 -0.33663 -0.04371 -0.57742 0.27813 NA
Note the NA for sector! We cannot estimate it due to colinearity, so it got dropped.
37.3.2 Fixed effects can handle clustering
That being said, fixed effects are an excellent way to control for school differences when looking at within-school relationships. For example, we can ask how math relates to SES within schools, controlling for systematic differences across schools.
Here is the no fixed effect regression, and the fixed effect regression:
For our fixed effect model, we will have lots of coefficients because we have a fixed effect for each school; the head() command is just showing us the first few. We also had to explicitly make our id variable a factor (categorical variable), so R doesn’t think it is a continuous covariate.
For our standard errors, etc., we can further account for clustering of our residuals above and beyond what can be explained by our fixed effects (even if we subtract out the mean outcome, we might still have dependencies between students within a given school). So we use our cluster-robust standard errors as so:
==================================================
No FE FE FE + CRVE
--------------------------------------------------
(Intercept) 12.75 ***
(0.08)
ses 3.18 *** 2.19 *** 2.19 ***
(0.10) (0.11) (0.13)
--------------------------------------------------
R^2 0.13 0.83 0.83
Adj. R^2 0.13 0.82 0.82
Num. obs. 7185 7185 7185
RMSE 6.08
N Clusters 160
==================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
A few things to note:
Not having fixed effects means we are getting an estimate of the math-ses relationship including school level context. Note the higher point estimate. Often we want to focus on within-school relationships. Fixed effects does this.
The standard errors are larger once we include fixed effects; the fixed effects are partially accounting for clustering.
The standard errors are even larger when we include CRVE. It is more fully accounting for the clustering, and the fact that the clusters themselves could vary. In general, one should typically use CRVE in addition to fixed effects, if one wants to view the clusters as representative of a larger population (in this case a larger population of schools).
37.3.3 Bonus: Interactions with level-2 variables are OK, even with fixed effects
If we want to see if the relationship of math and SES is different between schools, we can get tricky like so:
Mstud4 =lm_robust( mathach ~0+ ses + ses:sector + id, data=dat,cluster = id )head( coef( Mstud4 ) )
Note interaction terms always get pushed to the end of the list of estimates by R. So we have to pull them out with tail().
In the following we compare SEs to if we hadn’t used cluster robust SEs.
a <-lm( mathach ~0+ ses + ses:sector + id, data=dat )screenreg( list( wrong=a, adjusted=Mstud4 ), omit.coef="id", single.row =TRUE,include.ci=FALSE )
==================================================
wrong adjusted
--------------------------------------------------
ses 2.78 (0.14) *** 2.78 (0.16) ***
ses:sector -1.35 (0.22) *** -1.35 (0.23) ***
--------------------------------------------------
R^2 0.83 0.83
Adj. R^2 0.82 0.82
Num. obs. 7185 7185
RMSE 6.07
N Clusters 160
==================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
In our second column we are accounting for our clustering with our cluster robust SEs.
37.4 Fixed effects vs. cluster robust SEs
When running a regression with fixed effects and cluster-robust SEs, we might wonder when to use one vs. another, and when to use both. Here’s a breakdown of when to use each:
37.4.1 Fixed Effects
Use fixed effects when you want to control for unobserved variables that vary across groups (e.g., states, countries) but are constant over time or within those groups. Fixed effects helps eliminate bias from omitted variables that are group-specific and time-invariant.
Example: You are analyzing the effect of tuition fees on graduation rates, but there are unobserved factors (like state policies) that may affect both tuition and graduation rates.
When to use: If you believe that there are unobserved group-level characteristics that need to be controlled for.
Importantly, fixed-effects means you are estimating within group effects: you are no longer comparing one group to another.
Fixed effects by themselves can increase the plausibility of the residual independence assumption within groups. Without FEs (and no cluster-robust SEs) your SEs could be off as they are not accounting for the correlation of units within each group. FEs makes it more easy to believe your individual units are independent. So, roughly speaking, in many cases including fixed effects not only removes bias but also fixes your independence assumption for clustered data!
37.4.2 Cluster-Robust Standard Errors
Cluster-robust standard errors are used when you believe that observations within the same group (e.g., individuals within a state or students within a school) may be correlated. This method adjusts standard errors to account for potential intra-group correlation, ensuring more reliable inference.
Example: If students within the same community college may have correlated outcomes due to shared environments.
When to use: When there may be correlation in the error terms within groups, which could lead to underestimated standard errors.
But we just said fixed effects does this! CRSEs do this in a more robust way, making virtually no assumption on how units within groups might co-vary. But then why would we use both?
37.4.3 Using Both
If fixed effects account for clustering in your SEs, why bother with cluster-robust standard errors? That is an interesting question. Use both fixed effects and cluster-robust standard errors when:
You want to control for unobserved, time-invariant group-level factors with fixed effects.
You also suspect that there’s within-group correlation in the residuals even when pulling out the common fixed effect. This could be if you had further clustering within the cluster, for example (e.g., your school data was made by sampling a few classes from within the school). If that were happening, your SEs could be biased even when you include fixed effects.
Example: In a model of student performance across community colleges in different states, you may use fixed effects to control for state-level policies and cluster-robust standard errors to account for possible correlations between students in the same college.
There is another reason you might include CRSEs in your fixed effect model: you view your clusters as a sample from some larger population, and you want to get uncertainty estimates that include the question of whether the clusters in your data are representative of this larger population.
Fixed effects only just targets your evaluation sample (the data you have) and holds the clusters as fixed: you are estimating trends for those clusters in your data, and no further. CRSEs will assess cluster variation, and then give you SEs that include how that variation might make you more uncertain as to what you would find if you collected more clusters like the clusters you have.