6  Making tables in Markdown

When writing reports you may, from time to time, need to include a table. You should probably make a chart instead, but every so often a table actually is a nice thing to have. This chapter focuses on two key aspects: creating the table itself, and formatting it for a report or presentation. We here cover only generic tables; for guidance on creating regression tables (where you show a bunch of different regression models together), see Chapter 7.

Many table-making packages and functions in R produce basic tables that display nicely in a monospace font on the screen. This is a good starting point, but you’ll often need additional formatting to make the table publication-ready. R offers several excellent packages to help with this, catering to different needs. Some are particularly suited for HTML documents (such as websites), while others are better for PDF documents (such as reports and papers). Finding the right package for your specific use case can take some trial and error.

To illustrate these concepts, let’s start with some fake data.

library( tidyverse )
dat = tibble( G = sample( LETTERS[1:5], 100, replace=TRUE ),
              X = rnorm( 100 ),
              rp = sample( letters[1:3], 100, replace=TRUE ),
              Z = sample( c("tx","co"), 100, replace=TRUE ),
              Y = rnorm( 100 ) )

We can make summery of it by our grouping variable:

sdat <- dat %>% group_by( G) %>%
    summarise( EY = mean( Y ),
               pT = mean( Z == "tx" ),
               sdY = sd( Y ) )

Our intermediate results:

sdat
# A tibble: 5 × 4
  G          EY    pT   sdY
  <chr>   <dbl> <dbl> <dbl>
1 A      0.464  0.542 0.953
2 B     -0.522  0.562 0.800
3 C      0.185  0.5   0.850
4 D      0.184  0.565 1.07 
5 E      0.0220 0.565 1.16 

We can print this out in a much cleaner form using the kable() method from the knitr package:

knitr::kable( sdat, digits = 2 )
G EY pT sdY
A 0.46 0.54 0.95
B -0.52 0.56 0.80
C 0.18 0.50 0.85
D 0.18 0.57 1.07
E 0.02 0.57 1.16

Say our grouping variable is a set of codes for something more special. We can merge in better names by first making a small “cross-walk” of the ID codes to the full names, and then merging them to our results:

names = tribble( ~ G, ~ name,
                 "A", "fred",
                 "B", "doug",
                 "C", "xiao",
                 "D", "lily",
                 "E", "unknown" )
names
# A tibble: 5 × 2
  G     name   
  <chr> <chr>  
1 A     fred   
2 B     doug   
3 C     xiao   
4 D     lily   
5 E     unknown
sdat = left_join( sdat, names ) %>%
    relocate( name)
Joining with `by = join_by(G)`

Again, the easiest way to make a nice clean table is with the kable command.

knitr::kable( sdat, digits=2 )
name G EY pT sdY
fred A 0.46 0.54 0.95
doug B -0.52 0.56 0.80
xiao C 0.18 0.50 0.85
lily D 0.18 0.57 1.07
unknown E 0.02 0.57 1.16

This is a great workhorse table-making tool! There are expansion R packages as well, e.g. kableExtra, which can do lots of fancy customization stuff.

6.1 Making a “table one”

The “table one” is the first table in a lot of papers that show general means of different variables for different groups. Perhaps not surprisingly, the tableone package is useful for making such tables:

library(tableone)

# sample mean  
CreateTableOne(data = dat,
               vars = c("G", "Z", "X"))
               
                Overall     
  n              100        
  G (%)                     
     A            24 (24.0) 
     B            16 (16.0) 
     C            14 (14.0) 
     D            23 (23.0) 
     E            23 (23.0) 
  Z = tx (%)      55 (55.0) 
  X (mean (SD)) 0.02 (0.95) 
# you can also stratify by a variables of interest
tb <- CreateTableOne(data = dat,
                     vars = c("X", "G", "Y"), 
                     strata = c("Z"))
tb
               Stratified by Z
                co           tx            p      test
  n               45            55                    
  X (mean (SD)) 0.21 (0.94)  -0.14 (0.93)   0.065     
  G (%)                                     0.995     
     A            11 (24.4)     13 (23.6)             
     B             7 (15.6)      9 (16.4)             
     C             7 (15.6)      7 (12.7)             
     D            10 (22.2)     13 (23.6)             
     E            10 (22.2)     13 (23.6)             
  Y (mean (SD)) 0.00 (1.02)   0.19 (1.04)   0.365     

You can then use kable on your table as so:

print(tb$ContTable, printToggle = FALSE) %>%
    knitr::kable()
co tx p test
n 45 55
X (mean (SD)) 0.21 (0.94) -0.14 (0.93) 0.065
Y (mean (SD)) 0.00 (1.02) 0.19 (1.04) 0.365

6.2 The stargazer package

You can easily make pretty tables using the stargazer package. You need to ensure the data is a data.frame, not tibble, because stargazer is old school. It appears to only do continuous variables. Stargazer is probably best known for making regression tables (see next chapter), but it can make other kinds of tables as well, such as data summaries.

When using stargazer to summarize a dataset, you can specify that it should include only some of the variables and you can omit stats that are not of interest:

# to include only variables of interest
stargazer(as.data.frame(dat), header=FALSE, 
          omit.summary.stat = c("p25", "p75", "min", "max"), 
          # to omit percentiles
          title = "Table 1: Descriptive statistics",
          type = "text")

Table 1: Descriptive statistics
============================
Statistic  N  Mean  St. Dev.
----------------------------
X         100 0.018  0.948  
Y         100 0.101  1.027  
----------------------------

See the stargazer help file for how to set/change more of the options: https://cran.r-project.org/web/packages/stargazer/stargazer.pdf

Warning: stargazer does not work well with tibbles (the data frames you get from tidyverse commands), so you need to convert your data to a data.frame before using it. In particular, you have to “cast” your data to a data.frame to make it work:

  library(stargazer)
  
  # to include all variables
  stargazer( as.data.frame(dat), header = FALSE, type="text")

To use stargazer in a PDF or HTML report, you will want the report to format the table so it doesn’t look like raw output. To do so, you would not set type="text" but rather type="latex" or type="html", and then in the markdown chunk header (the thing that encloses all your R code) you would say “results=‘asis’” in your code chunk header like so:

This will ensure the output of stargazer gets formatted properly in your R Markdown.

Unfortunately, it is hard to dynamically make a report that can render to either html or a pdf, so you will have to choose one or the other. If you are making a PDF, you will want to use type="latex" and if you are making an HTML report, you will want to use type="html".

6.3 The xtable package

The xtable package is another great package for making tables. It is particularly good for LaTeX documents. It is a bit more complicated to use than stargazer, but it is very powerful. Here is an example of how to use it:

library(xtable)
xtable(sdat, caption = "A table of fake data" )

Here you would again use the “results=‘asis’” in the chunk header to get the table to render properly in your R Markdown document.