问题描述
使用 data.table 我可以执行以下操作:
Using data.table I can do the following:
library(data.table) dt = data.table(a = 1:2, b = c(1,2,NA,NA)) # a b #1: 1 1 #2: 2 2 #3: 1 NA #4: 2 NA dt[, b := b[1], by = a] # a b #1: 1 1 #2: 2 2 #3: 1 1 #4: 2 2
尝试在 dplyr 中执行相同的操作,但是数据被加扰/ c $ c> a :
Attempting the same operation in dplyr however the data gets scrambled/sorted by a:
library(dplyr) dt = data.table(a = 1:2, b = c(1,2,NA,NA)) dt %.% group_by(a) %.% mutate(b = b[1]) # a b #1 1 1 #2 1 1 #3 2 2 #4 2 2
(因为上面还排序了原来的 dt ,这有点让我感到困惑 dplyr 不会修改的哲学 - 我猜这是一个与 dplyr 如何与 data.table接口的错误)
(as an aside the above also sorts the original dt, which is somewhat confusing for me given dplyr's philosophy of not modifying in place - I'm guessing that's a bug with how dplyr interfaces with data.table)
dplyr 是什么方式实现上述?
What's the dplyr way of achieving the above?
推荐答案
b个表:
In the current development version of dplyr (which will eventuallybecome dplyr 0.2) the behaviour differs between data frames and datatables:
library(dplyr) library(data.table) df <- data.frame(a = 1:2, b = c(1,2,NA,NA)) dt <- data.table(df) df %.% group_by(a) %.% mutate(b = b[1]) ## Source: local data frame [4 x 2] ## Groups: a ## ## a b ## 1 1 1 ## 2 2 2 ## 3 1 1 ## 4 2 2 dt %.% group_by(a) %.% mutate(b = b[1]) ## Source: local data table [4 x 2] ## Groups: a ## ## a b ## 1 1 1 ## 2 1 1 ## 3 2 2 ## 4 2 2
这是因为 group_by()应用于 data.table
自动将 setkey()假设索引将
未来操作速度更快。
This happens because group_by() applied to a data.tableautomatically does setkey() on the assumption that the index will makefuture operations faster.
如果有强烈的感觉,这是一个不良的默认值,我很乐意改变它。
If there's a strong feeling that this is a bad default, I'm happy to change it.
这篇关于如何在dplyr突变而不会失去顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!