library( tidyverse )
= tibble( G = sample( LETTERS[1:5], 100, replace=TRUE ),
dat X = rnorm( 100 ),
rp = sample( letters[1:3], 100, replace=TRUE ),
Z = sample( c("tx","co"), 100, replace=TRUE ),
Y = rnorm( 100 ) )
6 Making tables in Markdown
You might want to make tables. Usually you should probably make charts instead, but every so often a table is a nice thing to have. This chapter is about making generic tables. For regression tables, see Chapter 7.
To illustrate, I make some fake data
We can make summery of it by our grouping variable:
<- dat %>% group_by( G) %>%
sdat summarise( EY = mean( Y ),
pT = mean( Z == "tx" ),
sdY = sd( Y ) )
Our intermediate results:
sdat
# A tibble: 5 × 4
G EY pT sdY
<chr> <dbl> <dbl> <dbl>
1 A -0.365 0.688 0.785
2 B 0.530 0.318 0.970
3 C -0.161 0.364 1.07
4 D -0.110 0.7 1.13
5 E -0.323 0.5 0.945
Say our grouping variable is a set of codes for something more special. We can merge in better names by first making a small “cross-walk” of the ID codes to the full names, and then merging them to our results:
= tribble( ~ G, ~ name,
names "A", "fred",
"B", "doug",
"C", "xiao",
"D", "lily",
"E", "unknown" )
names
# A tibble: 5 × 2
G name
<chr> <chr>
1 A fred
2 B doug
3 C xiao
4 D lily
5 E unknown
= left_join( sdat, names ) %>%
sdat relocate( name)
Joining with `by = join_by(G)`
Finally, the easiest way to make a table is with the kable
command.
::kable( sdat, digits=2 ) knitr
name | G | EY | pT | sdY |
---|---|---|---|---|
fred | A | -0.37 | 0.69 | 0.78 |
doug | B | 0.53 | 0.32 | 0.97 |
xiao | C | -0.16 | 0.36 | 1.07 |
lily | D | -0.11 | 0.70 | 1.13 |
unknown | E | -0.32 | 0.50 | 0.95 |
This is a great workhorse table-making tool! There are expansion R packages as well, e.g. kableExtra
, which can do lots of fancy customizable stuff.
6.1 Making a “table one”
The “table one” is the first table in a lot of papers that show general means of different variables for different groups. The tableone
package is useful:
library(tableone)
# sample mean
CreateTableOne(data = dat,
vars = c("G", "Z", "X"))
Overall
n 100
G (%)
A 16 (16.0)
B 22 (22.0)
C 22 (22.0)
D 20 (20.0)
E 20 (20.0)
Z = tx (%) 50 (50.0)
X (mean (SD)) 0.11 (0.88)
# you can also stratify by a variables of interest
<- CreateTableOne(data = dat,
tb vars = c("X", "G", "Y"),
strata = c("Z"))
tb
Stratified by Z
co tx p test
n 50 50
X (mean (SD)) 0.13 (0.97) 0.09 (0.79) 0.821
G (%) 0.041
A 5 (10.0) 11 (22.0)
B 15 (30.0) 7 (14.0)
C 14 (28.0) 8 (16.0)
D 6 (12.0) 14 (28.0)
E 10 (20.0) 10 (20.0)
Y (mean (SD)) 0.10 (1.03) -0.23 (1.02) 0.103
You can then use kable
as so:
print(tb$ContTable, printToggle = FALSE) %>%
::kable() knitr
co | tx | p | test | |
---|---|---|---|---|
n | 50 | 50 | ||
X (mean (SD)) | 0.13 (0.97) | 0.09 (0.79) | 0.821 | |
Y (mean (SD)) | 0.10 (1.03) | -0.23 (1.02) | 0.103 |
6.2 Table of summary stats
You can also easily make pretty tables using the stargazer
package. You need to ensure the data is a data.frame, not tibble, because stargazer
is old school. It appears to only do continuous variables.
Finally, you need to modify the R code chunk so it looks like this:
so the output of stargazer gets formatted properly in your R Markdown.
library(stargazer)
stargazer(as.data.frame(dat))
You can include only some of the variables and omit stats that are not of interest:
# to include only variables of interest
stargazer(as.data.frame(dat), header=FALSE,
omit.summary.stat = c("p25", "p75", "min", "max"), # to omit percentiles
title = "Table 1: Descriptive statistics")
See the stargazer
help file for how to set/change more of the options: https://cran.r-project.org/web/packages/stargazer/stargazer.pdf