R has this magical thing called non-standard evaluation. In this post, I investigate whether selecting columns in a data frame via $ corresponds to non-standard evaluation. Note that I use the following personal working definition of non-standard evaluation:

An R function performs non-standard evaluation if (at least) one of its arguments does not by itself evaluate to an R object.

This definition is incomplete and does not capture the “technical” essence of non-standard evaluation, but I guess it is useful from the viewpoint of an R user. In the following example, the function subset() performs non-standard evaluation to select rows in the built-in sleep data set, because the code ID == 1, passed as the second argument, does not evaluate to an R object.

subset(sleep, ID == 1)
##    extra group ID
## 1    0.7     1  1
## 11   1.9     2  1

ID is not itself a variable, but only a column in the data frame sleep; therefore, ID == 1 most likely throws if we were to type it in the R console. However, when passed as an argument to subset(), ID == 1 is valid R code because the function internally deals with it; subset() knows that ID has to be a column of the data frame sleep that is passed as the first argument.

Using the above definition, we can investigate whether the $ notation for selecting columns in a data frame corresponds to non-standard evaluation. For starters, we note that in R every operation is a function call, including special syntax for data selection such as $, or [·]. Moreover, every operation can be expressed in “standard” function notation that uses round brackets. In the case of the $ notation, the two following calls are equivalent:

sleep$ID
##  [1] 1  2  3  4  5  6  7  8  9  10 1  2  3  4  5  6  7  8  9  10
## Levels: 1 2 3 4 5 6 7 8 9 10
`$`(sleep, ID)
##  [1] 1  2  3  4  5  6  7  8  9  10 1  2  3  4  5  6  7  8  9  10
## Levels: 1 2 3 4 5 6 7 8 9 10

This code already answers the original question: The selection of columns via $ is non-standard evaluation because the second argument – ID, the name of the column that is selected – is passed as a name that by itself does not evaluate to an R object.

In contrast, column selection by [[·]] – which is equivalent to the $ notation – performs standard evaluation:

sleep[["ID"]]
##  [1] 1  2  3  4  5  6  7  8  9  10 1  2  3  4  5  6  7  8  9  10
## Levels: 1 2 3 4 5 6 7 8 9 10
`[[`(sleep, "ID")
##  [1] 1  2  3  4  5  6  7  8  9  10 1  2  3  4  5  6  7  8  9  10
## Levels: 1 2 3 4 5 6 7 8 9 10

This function call performs standard evaluation because "ID" is an R object – a character vector of length 1. Interestingly, column selection via $ is faster than column selection via [[·]]; in terms of speed, non-standard evaluation beats its standard counterpart in this case.


Last updated: 2020-06-19

Back to the front page