R
In a previous post, I compared the speed of different data frame column selection operations in R
(the $
operation performed fastest). In this post, I compare the speed of column selection in data frames and matrices.
Using the package microbenchmark
, I compare the speed of selecting single columns. First, I generate some random data and store it as matrix and data frame.
N <- 100
mat <- matrix(rnorm(N * 2), ncol = 2)
colnames(mat) <- c("c1", "c2")
df <- data.frame(mat)
Now, let’s compare how fast columns are selected:
library(microbenchmark)
microbenchmark(
mat[, 1],
df$c1
)
## Unit: nanoseconds
## expr min lq mean median uq max neval
## mat[, 1] 868 934 1047 966 1014 4369 100
## df$c1 508 528 668 575 608 8123 100
Interestingly, selecting a column from a data frame is a little bit faster than from a matrix. However, most interestingly, the discrepancy grows with increasing number of elements in the data frame:
N <- 10000
mat <- matrix(rnorm(N * 2), ncol = 2)
colnames(mat) <- c("c1", "c2")
df <- data.frame(mat)
microbenchmark(
mat[, 1],
df$c1
)
## Unit: nanoseconds
## expr min lq mean median uq max neval
## mat[, 1] 35541 65578 66780 67213 69066 92971 100
## df$c1 506 586 966 924 1052 5786 100
N <- 100000
mat <- matrix(rnorm(N * 2), ncol = 2)
colnames(mat) <- c("c1", "c2")
df <- data.frame(mat)
microbenchmark(
mat[, 1],
df$c1
)
## Unit: nanoseconds
## expr min lq mean median uq max neval
## mat[, 1] 355105 399924 632890 660520 676114 2961296 100
## df$c1 517 637 1618 1214 1673 15906 100
Whereas the speed of column selection is hardly affected for data frames, column selection gets considerably slower with matrices. I was quite surprised by this finding.
Last updated: 2020-06-19