置换 data.frame R 中的组

本文介绍了置换 data.frame R 中的组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的 data.frame:

DqStr

我想随机化组成员资格，但仅适用于具有相同 Dq1$q 值的行

这也可以用 plyr 来完成

库(plyr)ddply(Dq1,.(q), function(x) { x$Group <- sample(x$Group)数据框(x)})

因为我必须重复数千次，所以我想知道是否有更好(更快)的方法来做到这一点.

解决方案

如果我正确理解你的问题，这个 data.table 解决方案也适用:

library(data.table)Dq1

添加到罗伯特的上述基准:

库(plyr)图书馆(数据表)your_code <- function() { g <-unique(Dq1$q);Dq2

结果:

 单位:毫秒expr min lq 中值 uq max nevalyour_code() 6.290822 6.771324 6.848123 6.966648 9.639748 100plyr_code() 3.124676 3.307456 3.356095 3.455422 4.564390 100base_code() 1.168874 1.301224 1.326055 1.348327 2.269652 100data.table_code() 1.124844 1.157866 1.180649 1.209577 1.419750 100

对于这么小的数据集，data.table 并不明显优越.但是如果你有很多行(并且如果你使用 fread 将你的数据作为 data.table 读入开始)，你会看到 plyr 的显着加速，以及基础 R 的一些加速.所以不要太认真对待这个基准.

编辑:根据 Arun 的评论，更改为使用 as.data.table() 而不是 data.table().>

I have a data.frame like this:

DqStr <- "Group   q        Dq       SD.Dq
1 -3.0 0.7351 0.0067
1 -2.5 0.6995 0.0078
1 -2.0 0.6538 0.0093
2 -3.0 0.7203 0.0081
2 -2.5 0.6829 0.0094
2 -2.0 0.6350 0.0112"
Dq1 <- read.table(textConnection(DqStr), header=TRUE)

I would like to randomize group membership but only for rows with the same value of Dq1$q

g <-unique(Dq1$q)
Dq2<- data.frame()
for(n in g)
{
  Dqq <- Dq1[Dq1$q==n,]
  Dqq$Group <-sample(Dqq$Group)
  Dq2 <- rbind(Dq2,Dqq)
}

That could also be done with plyr

library(plyr)
ddply(Dq1,.(q), function(x) { x$Group <- sample(x$Group)
                              data.frame(x)})

as I have to repeat this thousands times I wonder if there are a better (faster) way to do it.

解决方案

If I'm understanding your question correctly, this data.table solution will also work:

library(data.table)
Dq1 <- as.data.table(Dq1)
Dq1[, Group := sample(Group), by = q]

Adding to Robert's benchmark above:

library(plyr)
library(data.table)

your_code <- function() { g <-unique(Dq1$q); Dq2<- data.frame(); for(n in g) { Dqq <- Dq1[Dq1$q==n,]; Dqq$Group <-sample(Dqq$Group); Dq2 <- rbind(Dq2,Dqq) } }
plyr_code <- function() { ddply(Dq1,.(q), function(x) { x$Group <- sample(x$Group); data.frame(x)}) }
base_code <- function() { Dq1$Group <- with(Dq1, ave(Group, q, FUN = sample)) }
data.table_code <- function() { Dq1 <- as.data.table(Dq1); Dq1[, Group := sample(Group), by = q] }

library(microbenchmark)
microbenchmark(your_code(), plyr_code(), base_code(), data.table_code())

Results:

    Unit: milliseconds
              expr      min       lq   median       uq      max neval
       your_code() 6.290822 6.771324 6.848123 6.966648 9.639748   100
       plyr_code() 3.124676 3.307456 3.356095 3.455422 4.564390   100
       base_code() 1.168874 1.301224 1.326055 1.348327 2.269652   100
 data.table_code() 1.124844 1.157866 1.180649 1.209577 1.419750   100

For a data set this small, data.table is not clearly superior. But if you have many rows (and if you use fread to read in your data as a data.table to start with), you'll see significant speedups over plyr, and some speedups over base R. So don't take this benchmark too seriously.

Edit: changed to use as.data.table() instead of data.table(), per Arun's comment.

                        这篇关于置换 data.frame R 中的组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！