7 Making Regression and ANOVA Tables

This document demonstrates different ways of making regression tables in your reports, and talks about some weird wrinkles with using them with multilevel modeling.

7.1 The basics of regression tables

For the basics we quickly illustrate regression tables using a subset of the Making Caring Common dataset, which we will eventually discuss in class. This dataset has a measure of emotional safety (our outcome) and we want to see, in a specific school, if this is predicted by gender and/or grade.

Our data look like this:

sample_n( sch1, 6 )

  ID esafe grade gender     disc race_white
1  1     4     8   Male 1.111111          1
2  1     4     6 Female 1.000000          1
3  1     4     6 Female 1.000000          1
4  1     4     8 Female 1.000000          1
5  1     4     5 Female 1.000000          1
6  1     4     8   Male 1.888889          0

We fit some models:

M_A = lm( esafe ~ grade, data = sch1 )
M_B = lm( esafe ~ grade + gender, data = sch1 )
M_C = lm( esafe ~ grade * gender, data = sch1 )

Ok, we have fit our regression models. We can look at big complex printout of a single model like so:

summary( M_C )


Call:
lm(formula = esafe ~ grade * gender, data = sch1)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7894 -0.1570  0.1550  0.2662  0.4938 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       4.12733    0.37946  10.877   <2e-16 ***
grade            -0.07764    0.05859  -1.325   0.1879    
genderMale       -0.72735    0.49762  -1.462   0.1467    
grade:genderMale  0.13327    0.07627   1.747   0.0834 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4333 on 108 degrees of freedom
Multiple R-squared:  0.04914,   Adjusted R-squared:  0.02273 
F-statistic: 1.861 on 3 and 108 DF,  p-value: 0.1406

Or we can make regression tables. Consider these two packages, the first being texreg

library( texreg )
screenreg(list(M_A, M_B, M_C))


====================================================
                  Model 1     Model 2     Model 3   
----------------------------------------------------
(Intercept)         3.68 ***    3.62 ***    4.13 ***
                   (0.25)      (0.25)      (0.38)   
grade               0.00        0.00       -0.08    
                   (0.04)      (0.04)      (0.06)   
genderMale                      0.13       -0.73    
                               (0.08)      (0.50)   
grade:genderMale                            0.13    
                                           (0.08)   
----------------------------------------------------
R^2                 0.00        0.02        0.05    
Adj. R^2           -0.01        0.00        0.02    
Num. obs.         112         112         112       
====================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Another is stargazer.

library( stargazer )
stargazer( M_A, M_B, M_C, header=FALSE, type='text')


===============================================================================
                                        Dependent variable:                    
                    -----------------------------------------------------------
                                               esafe                           
                            (1)                 (2)                 (3)        
-------------------------------------------------------------------------------
grade                      0.004               0.001              -0.078       
                          (0.038)             (0.038)             (0.059)      
                                                                               
genderMale                                     0.130              -0.727       
                                              (0.083)             (0.498)      
                                                                               
grade:genderMale                                                  0.133*       
                                                                  (0.076)      
                                                                               
Constant                 3.676***            3.624***            4.127***      
                          (0.249)             (0.250)             (0.379)      
                                                                               
-------------------------------------------------------------------------------
Observations                112                 112                 112        
R2                        0.0001               0.022               0.049       
Adjusted R2               -0.009               0.004               0.023       
Residual Std. Error  0.440 (df = 110)    0.437 (df = 109)    0.433 (df = 108)  
F Statistic         0.009 (df = 1; 110) 1.241 (df = 2; 109) 1.861 (df = 3; 108)
===============================================================================
Note:                                               *p<0.1; **p<0.05; ***p<0.01

7.2 Extending to the multilevel model

For our multilevel examples, we use the Making Caring Common data from Project A, and fit data to the 8th grade students only, but do it for all schools. We have made a High School dummy variable.

Our two models we use for demo purposes have a HS term and no HS term:

modA <- lmer( esafe ~ 1 + (1 | ID), data=dat.g8)
modB <- lmer( esafe ~ 1 + HS + (1 | ID), data=dat.g8)

In the next sections we first show how to get better summary output (according to some folks) and then we walk through making regression tables in a bit more detail than above.

7.3 Getting p-values for lmer output

The lmerTest package is a way of making R give you more complete output. We are going to load it, and then put the new lmer models into new variables so we can see how the different model fitting packages work with the regression table packages below.

library( lmerTest )
modB.T <- lmer( esafe ~ 1 + HS + (1 | ID), data=dat.g8)
modA.T <- lmer( esafe ~ 1 + (1 | ID), data=dat.g8)

summary( modB.T )

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: esafe ~ 1 + HS + (1 | ID)
   Data: dat.g8

REML criterion at convergence: 2746.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3883 -0.6156  0.2021  0.7628  1.7331 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 0.04809  0.2193  
 Residual             0.46459  0.6816  
Number of obs: 1305, groups:  ID, 26

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  3.52798    0.08637 29.91033  40.846   <2e-16 ***
HSTRUE      -0.29480    0.10787 25.77814  -2.733   0.0112 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
       (Intr)
HSTRUE -0.801

7.4 The texreg package

In texreg there are two primary functions for table making, one is screenreg() and the other is texreg().

7.4.1 Using screenreg()

Screenreg is fine for MLMs. It looks a bit like raw output, but it is clear and clean. It will take models fit using lmer or lmerTest, no problem.

screenreg(list(modA,modB))


===============================================
                     Model 1       Model 2     
-----------------------------------------------
(Intercept)              3.35 ***      3.53 ***
                        (0.06)        (0.09)   
HSTRUE                                -0.29 ** 
                                      (0.11)   
-----------------------------------------------
AIC                   2756.78       2754.79    
BIC                   2772.30       2775.49    
Log Likelihood       -1375.39      -1373.40    
Num. obs.             1305          1305       
Num. groups: ID         26            26       
Var: ID (Intercept)      0.07          0.05    
Var: Residual            0.46          0.46    
===============================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Comment: Note that the number of stars are different for the display vs the summary output! (Look at the HS coefficient for example.) Not good, it would seem.

This is because the \(p\)-values are calculated using the normal approximation by the screenreg command, and using the \(t\)-test with approximate degrees of freedom by lmerTest. This makes a difference. Consider the following, using the \(t\) statistics for the HS variable:

2 * pt( -2.733, df=25.77814 )

[1] 0.0111831

2 * pnorm( -2.733 )

[1] 0.006276033

One is below 0.01, and one is not. An extra star!

7.4.2 Using texreg() and TeX

The texreg command is part of the texreg package and can be integrated with latex (which you would need to install). Once you do this, when you compile to a pdf, all is well. In the R code chunk you need to include results="asis" to get the latex to compile right. E.g., “r, results="asis"” when you declare a code chunk.

texreg(list(modA,modB), table=FALSE)

Note that the table=FALSE puts the table right where you want it, not at some random spot latex things is nice. Latex likes to have “floating tables,” where it puts the table where there is space; this makes it easier to make the entire formatted page look nice.

7.5 The stargazer package

library( stargazer )
stargazer(modA, modB, header=FALSE, type='latex')

One issue is stargazer does not include the random effect variances, so the output is quite limited for multilevel modeling. It also has less stringent conditions for when to put down stars. One star is below 0.10, two is below 0.05, and three is below 0.01. This is quite generous. Also it is using the normal approximation.

7.5.1 Stargazer with lmerTest

Stargazer with lmerTest is a bit fussy. This shows how to make it work if you have loaded the lmerTest package. Recall the lmerTest package makes your lmer commands have p-values and whatnot. But this means your new lmer() command is not quite the same as the old—and stargazer is expecting the old. You gix this by lying to R, telling it the new thing is the old thing. This basically works.

Now for stargazer, we need to tell it that our models are the right type. First note:

class( modB )

[1] "lmerMod"
attr(,"package")
[1] "lme4"

class( modB.T)

[1] "lmerModLmerTest"
attr(,"package")
[1] "lmerTest"

So we fix as follows:

library( stargazer )
class( modB.T ) = "lmerMod" 
class( modA.T ) = "lmerMod" 
stargazer(modA.T, modB.T, header=FALSE, type='latex' )

7.5.2 The sjPlot package

One function, tab_model from sjPlot, makes nice regression tables:

# tabulate the results of our two tip models
library( sjPlot )
tab_model(modA.T, modB.T)

	esafe			esafe
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	3.35	3.23 – 3.46	<0.001	3.53	3.36 – 3.70	<0.001
HSTRUE				-0.29	-0.51 – -0.08	0.006
Random Effects
σ²	0.46			0.46
τ₀₀	0.07 _ID			0.05 _ID
ICC	0.12			0.09
N	26 _ID			26 _ID
Observations	1305			1305
Marginal R² / Conditional R²	0.000 / 0.125			0.028 / 0.119

7.6 Pretty ANOVA (liklihood ratio test) Tables

We now turn to how to give a nice display of likelihood ratio test results using the kable function. To show how to make such tables, we first create a fake illustrative data set called a that has 100 observations and specifies our outcome Y as a funciton of two uncorrelated variables A and B

a <- tibble( A = rnorm( 100 ),
            B = rnorm( 100 ),
            Y = A * 0.2 + B * 0.5 + rnorm( 100, 0, 1 ) )

7.6.1 Run the Models

We fit two models, one with A and B, the other with just A.

M1 <- lm( Y~ A + B, data = a )
M2 <- lm( Y ~ A, data = a )

7.6.2 Comparing the Models

We use the anova function to compare the two models (see also the chapter on Likelihood Ratio tests). We see that B improves the model fit significantly.

aa = anova( M2, M1 )
aa

Analysis of Variance Table

Model 1: Y ~ A
Model 2: Y ~ A + B
  Res.Df    RSS Df Sum of Sq     F    Pr(>F)    
1     98 130.54                                 
2     97 109.03  1    21.503 19.13 3.074e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

aa |> 
  tidy() |> 
  kable()

term	df.residual	rss	df	sumsq	statistic	p.value
Y ~ A	98	130.5386	NA	NA	NA	NA
Y ~ A + B	97	109.0352	1	21.50338	19.12986	3.07e-05

7.6.3 Compare to the Significance test on `B`

Note that the p value for B is identical to the ANOVA results above. Why bother with ANOVA? It can test more complex hypotheses as well (multiple coefficients, random effects, etc.)

M1 |> 
  tidy() |> 
  kable()

term	estimate	std.error	statistic	p.value
(Intercept)	0.1331894	0.1063276	1.252633	0.2133507
A	0.2320947	0.1037421	2.237229	0.0275605
B	0.4375799	0.1000464	4.373769	0.0000307

7.6.4 Acknowledgements

The ANOVA portion of this chapter was initially drafted by Josh Gilbert