$ non-standard evaluation?R has this magical thing called non-standard evaluation. In this post, I investigate whether selecting columns in a data frame via $ corresponds to non-standard evaluation. Note that I use the following personal working definition of non-standard evaluation:
An
Rfunction performs non-standard evaluation if (at least) one of its arguments does not by itself evaluate to anRobject.
This definition is incomplete and does not capture the “technical” essence of non-standard evaluation, but I guess it is useful from the viewpoint of an R user. In the following example, the function subset() performs non-standard evaluation to select rows in the built-in sleep data set, because the code ID == 1, passed as the second argument, does not evaluate to an R object.
subset(sleep, ID == 1)
## extra group ID
## 1 0.7 1 1
## 11 1.9 2 1
ID is not itself a variable, but only a column in the data frame sleep; therefore, ID == 1 most likely throws if we were to type it in the R console. However, when passed as an argument to subset(), ID == 1 is valid R code because the function internally deals with it; subset() knows that ID has to be a column of the data frame sleep that is passed as the first argument.
Using the above definition, we can investigate whether the $ notation for selecting columns in a data frame corresponds to non-standard evaluation. For starters, we note that in R every operation is a function call, including special syntax for data selection such as $, or [·]. Moreover, every operation can be expressed in “standard” function notation that uses round brackets. In the case of the $ notation, the two following calls are equivalent:
sleep$ID
## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
## Levels: 1 2 3 4 5 6 7 8 9 10
`$`(sleep, ID)
## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
## Levels: 1 2 3 4 5 6 7 8 9 10
This code already answers the original question: The selection of columns via $ is non-standard evaluation because the second argument – ID, the name of the column that is selected – is passed as a name that by itself does not evaluate to an R object.
In contrast, column selection by [[·]] – which is equivalent to the $ notation – performs standard evaluation:
sleep[["ID"]]
## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
## Levels: 1 2 3 4 5 6 7 8 9 10
`[[`(sleep, "ID")
## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
## Levels: 1 2 3 4 5 6 7 8 9 10
This function call performs standard evaluation because "ID" is an R object – a character vector of length 1. Interestingly, column selection via $ is faster than column selection via [[·]]; in terms of speed, non-standard evaluation beats its standard counterpart in this case.
Last updated: 2020-06-19