In a previous post, I compared the speed of different data frame column selection operations in R (the $ operation performed fastest). In this post, I compare the speed of column selection in data frames and matrices.

Using the package microbenchmark, I compare the speed of selecting single columns. First, I generate some random data and store it as matrix and data frame.

N <- 100
mat <- matrix(rnorm(N * 2), ncol = 2)
colnames(mat) <- c("c1", "c2")
df <- data.frame(mat)

Now, let’s compare how fast columns are selected:

library(microbenchmark)
microbenchmark(
  mat[, 1],
  df$c1
)
## Unit: nanoseconds
##      expr min  lq mean median   uq  max neval
##  mat[, 1] 868 934 1047    966 1014 4369   100
##     df$c1 508 528  668    575  608 8123   100

Interestingly, selecting a column from a data frame is a little bit faster than from a matrix. However, most interestingly, the discrepancy grows with increasing number of elements in the data frame:

N <- 10000
mat <- matrix(rnorm(N * 2), ncol = 2)
colnames(mat) <- c("c1", "c2")
df <- data.frame(mat)
microbenchmark(
  mat[, 1],
  df$c1
)
## Unit: nanoseconds
##      expr   min    lq  mean median    uq   max neval
##  mat[, 1] 35541 65578 66780  67213 69066 92971   100
##     df$c1   506   586   966    924  1052  5786   100
N <- 100000
mat <- matrix(rnorm(N * 2), ncol = 2)
colnames(mat) <- c("c1", "c2")
df <- data.frame(mat)
microbenchmark(
  mat[, 1],
  df$c1
)
## Unit: nanoseconds
##      expr    min     lq   mean median     uq     max neval
##  mat[, 1] 355105 399924 632890 660520 676114 2961296   100
##     df$c1    517    637   1618   1214   1673   15906   100

Whereas the speed of column selection is hardly affected for data frames, column selection gets considerably slower with matrices. I was quite surprised by this finding.


Last updated: 2020-06-19

Back to the front page