1  A R Code Style (Miratrix version)

Author

Luke Miratrix, adapted from Peter Ganong via Avi Feller

Published

September 24, 2024

1.1 Why have coding style?

  • Many style decisions are arbitrary.
  • Why bother?
    1. it makes your code readable
    2. it means you can focus on writing good code
    3. you will be looked down on if you use bad style

Much of this is from the Tidyverse style guide at http://style.tidyverse.org; we are primarily focusing on Chapters 2 and 4. Cartoon is xkcd; read those if you want to be an awesome nerd.

1.2 names (style guide rule 2.1)

1.2.1 Rule 2.1: Naming

“There are only two hard things in Computer Science: cache invalidation and naming things.” —Phil Karlton

  • Variable and function names should be lowercase.
  • Use an underscore to separate words within a name.
  • Generally, variable names should be nouns and function names should be verbs.
# Good
day_one
first_day

# Bad
first_day_of_the_month
DayOne
dayone
djm1

1.2.2 Rule 2.2: Don’t use common names

# Bad
TRUE <- FALSE
pi <- 10
mean <- function(x) sum(x)

1.2.3 Example: winsorization

# Good
winsor_upper <- 0.99
winsor_lower <- 0.01
diamonds <-
  diamonds %>%
  mutate(y_winsor = winsorize(y, probs = c(winsor_lower, winsor_upper)))

# Mediocre
diamonds_clean <-
  diamonds %>%
  mutate(y = winsorize(y, probs = c(0.01, 0.99)))

1.2.4 Naming summary

  • Principle: Ideally, your names should be self-explanatory and your code should be “self-documenting.”
  • A few specific tips:
  • This is hard. more art than science.

1.3 Syntax

1.3.1 syntax: roadmap

  • naming
  • spaces
  • argument names
  • line length
  • assignment
  • quotes
  • comments

1.3.2 Rule 2.2: Spaces (I)

  • Put a space before and after = when naming arguments in function calls.
  • Always put a space after a comma, and never before (just like in regular English).
# Good
average <- mean(x, na.rm = TRUE)

# Also good
average <- mean( x, na.rm = TRUE )

# Bad
average<-mean(x, na.rm = TRUE)
average <- mean(x ,na.rm = TRUE)

1.3.3 Rule 2.2: Spaces (II)

  • Most infix operators (==, +, -, <-, etc.) should be surrounded by spaces.
  • The exception are those with relatively high precedence: ^, :, ::, and :::. (“High precedence” means that these operators are evaluated first, like multiplication goes before addition.)
# Good
height <- (feet * 12) + inches
sqrt(x^2 + y^2)
x <- 1:10
base::get

# Bad
height<-feet*12 + inches
sqrt(x ^ 2 + y ^ 2)
x <- 1 : 10
base :: get

1.3.4 Rule 2.2: Spaces (III)

Extra spacing (i.e., more than one space in a row) is ok if it improves alignment of equal signs or assignments (<-).

# Good
list(
  total = a + b + c,
  mean  = (a + b + c) / n
)

# Less good, but livable
list(
  total = a + b + c,
  mean = (a + b + c) / n
)

1.3.5 Rule 2.3: Argument names

Function arguments: data to compute on and details of computation.

Omit names of common arguments (e.g. data, aes)

If you override the default value of an argument, use the full name:

# Good
mean(1:10, na.rm = TRUE)

# Bad
mean(x = 1:10, , FALSE)
mean(, TRUE, x = c(1:10, NA))

1.3.6 Rule 2.5: Line length: 80 characters

  • use one line each for the function name, each argument, and the closing )
# Good
do_something_very_complicated(
  something = "that",
  requires = many,
  arguments = "some of which may be long"
)

# Very bad
do_something_very_complicated("that", requires, many, arguments, "some of which may be long")

# Still bad
do_something_very_complicated(
  "that", requires, many,
  arguments,
  "some of which may be long"
)

# Yup, still bad
do_something_very_complicated(
  "that", requires, many, arguments,
  "some of which may be long"
)

1.3.7 Rule 2.5: Line length

Exception: short unnamed arguments can also go on the same line as the function name, even if the whole function call spans multiple lines.

map(x, f,
  extra_argument_a = 10,
  extra_argument_b = c(1, 43, 390, 210209)
)

1.3.8 Rule 2.6: Assignment (if you are prissy)

Use <-, not =, for assignment.

# Good
x <- 5

# Bad
x = 5

1.3.9 Rule 2.8: Quotes

Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.

# Good
"Text"
'Text with "quotes"'
'<a href="http://style.tidyverse.org">A link</a>'

# Bad
"Text"
'Text with "double" and \'single\' quotes'

1.3.10 Rule 2.9: Comments

If you need comments to explain what your code is doing, rewrite your code.

Remarks

  1. This is counter-intuitive! The problem with comments is that you can change your code without changing the comments. So when you go back and make a change to the code (as is very often necessary), then your comment becomes a source of confusion rather than clarity.

  2. 30535: You can use text in the markdown document to explain what your code is doing in plain English. Use complete sentences. But it is better if you just write the code well.

  3. Life post 30535: There are times when comments are useful, but I try to use them sparingly.

1.3.11 Syntax summary

  • use whitespace
  • arguments: data before details
  • line length: 80 characters
  • assignment: <-
  • use double quotes
  • avoid comments
  • I skipped 2.4 and 2.7 because they relate to material we haven’t learned yet

1.4 Pipes with magrittr

1.4.1 pipes %>%: roadmap

  1. intro
  2. whitespace
  3. long lines
  4. short pipes
  5. no arguments
  6. assignment

1.4.2 Rule 4.1: intro

Use %>% (or |>, if you are modern) to emphasise a sequence of actions, rather than the object that the actions are being performed on.

Avoid using the pipe when:

  • You need to manipulate more than one object at a time. Reserve pipes for a sequence of steps applied to one primary object.

  • There are meaningful intermediate objects that could be given informative names (cf rule 2.9).

1.4.3 Rule 4.2: whitespace

%>% should always have a space before it, and should usually be followed by a new line. After the first step, each line should be indented by two spaces. This structure makes it easier to add new steps (or rearrange existing steps) and harder to overlook a step.

# Good
iris %>%
  group_by(Species) %>%
  summarize_if(is.numeric, mean) %>%
  ungroup() %>%
  gather(measure, value, -Species) %>%
  arrange(value)

# Bad
iris %>% group_by(Species) %>% summarize_all(mean) %>% 
ungroup() %>% gather(measure, value, -Species) %>%
arrange(value)

1.4.4 Rule 4.4: short pipes I

It is ok to keep a one-step pipe in one line:

# Good
iris %>% arrange(Species)

# Mediocre
iris %>%
  arrange(Species)

arrange(iris, Species)

1.4.5 Rule 4.4: short pipes II

# Bad
x %>%
  select(a, b, w) %>%
  left_join(
    y %>% filter(!u) %>% gather(a, v, -b) %>% select(a, b, v),
    by = c("a", "b")
  )

1.4.6 Rule 4.4: short pipes III

# Good
x %>%
  select(a, b, w) %>%
  left_join(y %>% select(a, b, v), by = c("a", "b"))

x_join <-
  x %>%
  select(a, b, w)
y_join <-
  y %>%
  filter(!u) %>%
  gather(a, v, -b) %>%
  select(a, b, v)
left_join(x_join, y_join, by = c("a", "b"))

1.4.7 Rule 4.5: No arguments

magrittr allows you to omit () on functions that don’t have arguments. Avoid this. This way data objects never have parentheses and functions always do.

# Good
x %>%
  unique() %>%
  sort()

# Bad
x %>%
  unique %>%
  sort

1.4.8 Rule 4.6: Assignment

Use a separate line for the target of the assignment followed by <-.

# Good
iris_long <-
  iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value)

# Bad
iris_long <- iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value)

1.4.9 Pipes %>% summary

  1. pipes are awesome
  2. use whitespace
  3. short pipes can be on one line
  4. use parentheses even if there are no arguments
  5. assignment on a separate line

Skipped rule 4.3 since redundant to prior chapter

1.5 Code style summary

  • Style is awesome. Save a future researcher from spending two months trying to disentangle your spaghetti!
  • You don’t need to memorize these rules! Just as you have spell check and grammarly on your computer for prose, there is a package styler to help you follow the code style guide.
  • Just as you still need to learn to spell (since spell checker doesn’t capture everything), you need to learn these rules as well.

In closing:

“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” –Hadley Wickham