6  Making tables in Markdown

You might want to make tables. Usually you should probably make charts instead, but every so often a table is a nice thing to have. This chapter is about making generic tables. For regression tables, see Chapter 7.

To illustrate, I make some fake data

library( tidyverse )
dat = tibble( G = sample( LETTERS[1:5], 100, replace=TRUE ),
              X = rnorm( 100 ),
              rp = sample( letters[1:3], 100, replace=TRUE ),
              Z = sample( c("tx","co"), 100, replace=TRUE ),
              Y = rnorm( 100 ) )

We can make summery of it by our grouping variable:

sdat <- dat %>% group_by( G) %>%
    summarise( EY = mean( Y ),
               pT = mean( Z == "tx" ),
               sdY = sd( Y ) )

Our intermediate results:

sdat
# A tibble: 5 × 4
  G         EY    pT   sdY
  <chr>  <dbl> <dbl> <dbl>
1 A     -0.365 0.688 0.785
2 B      0.530 0.318 0.970
3 C     -0.161 0.364 1.07 
4 D     -0.110 0.7   1.13 
5 E     -0.323 0.5   0.945

Say our grouping variable is a set of codes for something more special. We can merge in better names by first making a small “cross-walk” of the ID codes to the full names, and then merging them to our results:

names = tribble( ~ G, ~ name,
                 "A", "fred",
                 "B", "doug",
                 "C", "xiao",
                 "D", "lily",
                 "E", "unknown" )
names
# A tibble: 5 × 2
  G     name   
  <chr> <chr>  
1 A     fred   
2 B     doug   
3 C     xiao   
4 D     lily   
5 E     unknown
sdat = left_join( sdat, names ) %>%
    relocate( name)
Joining with `by = join_by(G)`

Finally, the easiest way to make a table is with the kable command.

knitr::kable( sdat, digits=2 )
name G EY pT sdY
fred A -0.37 0.69 0.78
doug B 0.53 0.32 0.97
xiao C -0.16 0.36 1.07
lily D -0.11 0.70 1.13
unknown E -0.32 0.50 0.95

This is a great workhorse table-making tool! There are expansion R packages as well, e.g. kableExtra, which can do lots of fancy customizable stuff.

6.1 Making a “table one”

The “table one” is the first table in a lot of papers that show general means of different variables for different groups. The tableone package is useful:

library(tableone)

# sample mean  
CreateTableOne(data = dat,
               vars = c("G", "Z", "X"))
               
                Overall     
  n              100        
  G (%)                     
     A            16 (16.0) 
     B            22 (22.0) 
     C            22 (22.0) 
     D            20 (20.0) 
     E            20 (20.0) 
  Z = tx (%)      50 (50.0) 
  X (mean (SD)) 0.11 (0.88) 
# you can also stratify by a variables of interest
tb <- CreateTableOne(data = dat,
                     vars = c("X", "G", "Y"), 
                     strata = c("Z"))
tb
               Stratified by Z
                co           tx            p      test
  n               50            50                    
  X (mean (SD)) 0.13 (0.97)   0.09 (0.79)   0.821     
  G (%)                                     0.041     
     A             5 (10.0)     11 (22.0)             
     B            15 (30.0)      7 (14.0)             
     C            14 (28.0)      8 (16.0)             
     D             6 (12.0)     14 (28.0)             
     E            10 (20.0)     10 (20.0)             
  Y (mean (SD)) 0.10 (1.03)  -0.23 (1.02)   0.103     

You can then use kable as so:

print(tb$ContTable, printToggle = FALSE) %>%
    knitr::kable()
co tx p test
n 50 50
X (mean (SD)) 0.13 (0.97) 0.09 (0.79) 0.821
Y (mean (SD)) 0.10 (1.03) -0.23 (1.02) 0.103

6.2 Table of summary stats

You can also easily make pretty tables using the stargazer package. You need to ensure the data is a data.frame, not tibble, because stargazer is old school. It appears to only do continuous variables.

Finally, you need to modify the R code chunk so it looks like this:

so the output of stargazer gets formatted properly in your R Markdown.

library(stargazer)

stargazer(as.data.frame(dat))
% Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com % Date and time: Tue, Sep 24, 2024 - 16:12:27

You can include only some of the variables and omit stats that are not of interest:

# to include only variables of interest
stargazer(as.data.frame(dat), header=FALSE, 
          omit.summary.stat = c("p25", "p75", "min", "max"), # to omit percentiles
          title = "Table 1: Descriptive statistics")

See the stargazer help file for how to set/change more of the options: https://cran.r-project.org/web/packages/stargazer/stargazer.pdf