# load libraries
library(tidyverse)
library(lme4)
library(haven)
library(sjPlot)
# load HSB data
<- read_dta("data/hsb.dta") |>
hsb select(mathach, ses, schoolid)
29 Likelihood Ratio Tests
In this chapter we give an overview of using likelihood ratio tests to assess whether a more complex model is justified by the data, as compared to a simpler model.
We will use our old friend HS&B to illustrate. We first load the data:
29.1 Why LR Tests?
Our fixed effects coefficients have SEs, z-statistics, and p-values, which allow us to easily test the null hypothesis that the slopes are 0 in the population. No such quantities, however, are provided for the random effects of our model. We can use LR tests to address this issue and test the statistical significance of the various random portions of our model.
We can also use LR tests on fixed effects or sets of fixed effects (like a nested F-test in OLS), but 99.9% of the time, the conclusion will be the same as using the z-statistics.
LR tests require that the models are nested, meaning that they use the same data, and one model can be expressed as a constrained version of the other.
29.2 HSB Example
We fit 3 models:
- Random intercept model
- Random slope model with no correlation between intercepts and slopes (you probably have not seen this before, but we will use it to test whether the slopes are correlated with the intercepts).
- Random slope model
We can see from the model output that the point estimates for the random slope variance \(\tau_{11}\) and the correlation \(\rho_{01}\) are non-zero, but how can we get p-values for these quantities?
<- lmer(mathach ~ ses + (1|schoolid), hsb)
m1 <- lmer(mathach ~ ses + (1|schoolid)+(0+ses|schoolid), hsb)
m2 <- lmer(mathach ~ ses + (1+ses|schoolid), hsb)
m3
tab_model(m1, m2, m3,
p.style = "stars",
show.se = TRUE,
show.ci = FALSE,
dv.labels = c("RI", "No Rho", "RS"))
RI | No Rho | RS | ||||
---|---|---|---|---|---|---|
Predictors | Estimates | std. Error | Estimates | std. Error | Estimates | std. Error |
(Intercept) | 12.66 *** | 0.19 | 12.65 *** | 0.19 | 12.67 *** | 0.19 |
ses | 2.39 *** | 0.11 | 2.40 *** | 0.12 | 2.39 *** | 0.12 |
Random Effects | ||||||
σ2 | 37.03 | 36.82 | 36.83 | |||
τ00 | 4.77 schoolid | 4.85 schoolid | 4.83 schoolid | |||
τ11 | 0.42 schoolid.ses | 0.41 schoolid.ses | ||||
ρ01 | -0.11 schoolid | |||||
ICC | 0.11 | 0.12 | 0.12 | |||
N | 160 schoolid | 160 schoolid | 160 schoolid | |||
Observations | 7185 | 7185 | 7185 | |||
Marginal R2 / Conditional R2 | 0.077 / 0.182 | 0.077 / 0.185 | 0.077 / 0.189 | |||
* p<0.05 ** p<0.01 *** p<0.001 |
29.2.1 Are random ses
slopes necessary?
We use anova
to perform the LR test comparing m1
and m3
to see if we need random slopes. We see that the random slopes are not statistically significant.
anova(m1, m3)
Data: hsb
Models:
m1: mathach ~ ses + (1 | schoolid)
m3: mathach ~ ses + (1 + ses | schoolid)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
m1 4 46649 46677 -23320 46641
m3 6 46648 46690 -23318 46636 4.5354 2 0.1035
29.2.2 Is there a correlation between the random intercept and slope for ses
?
We next can compare model 2 and model 3 to see if the correlation is needed, given the random slope model. We see it is not:
anova(m2, m3)
Data: hsb
Models:
m2: mathach ~ ses + (1 | schoolid) + (0 + ses | schoolid)
m3: mathach ~ ses + (1 + ses | schoolid)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
m2 5 46647 46681 -23318 46637
m3 6 46648 46690 -23318 46636 0.2762 1 0.5992
29.3 Technical Notes
TL/DR: The traditional LR test provided by anova
is likely to be conservative for testing the significance of variance components. For the purposes of this course, it is fine.
There is a lot of multilevel literature arguing that testing a null hypothesis on variance components with LR tests is not the best approach. The reason is that variances cannot be negative, so the null hypothesis exists on the “boundary of the parameter space” and therefore is “likely to be conservative” (to use the warning that Stata gives you, i.e., the p-values are too high). The true distribution of a 0 variance component is not a normal distribution, but a mixture distribution with half of the probability mass at 0 and the other half \(\chi^2\). When you’re testing the significance of the random intercepts model, you can divide the p-value by 2 to get the right answer (Stata default), though for more complex models it’s not so simple. There are some R packages that use simulation-based approaches to provide more robust results such as pbkrtest::PBmodcomp
, but we won’t go into them here. See RH&S pp. 88-89 for a more thorough discussion of this issue. Despite this, the standard LR test remains common in practice.