本文介绍了如何在dplyr突变而不会失去顺序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 data.table 我可以执行以下操作:

Using data.table I can do the following:

library(data.table)
dt = data.table(a = 1:2, b = c(1,2,NA,NA))
#   a  b
#1: 1  1
#2: 2  2
#3: 1 NA
#4: 2 NA

dt[, b := b[1], by = a]
#   a b
#1: 1 1
#2: 2 2
#3: 1 1
#4: 2 2

尝试在 dplyr 中执行相同的操作,但是数据被加扰/ c $ c> a :

Attempting the same operation in dplyr however the data gets scrambled/sorted by a:

library(dplyr)
dt = data.table(a = 1:2, b = c(1,2,NA,NA))
dt %.% group_by(a) %.% mutate(b = b[1])
#  a b
#1 1 1
#2 1 1
#3 2 2
#4 2 2

(因为上面还排序了原来的 dt ,这有点让我感到困惑 dplyr 不会修改的哲学 - 我猜这是一个与 dplyr 如何与 data.table接口的错误)

(as an aside the above also sorts the original dt, which is somewhat confusing for me given dplyr's philosophy of not modifying in place - I'm guessing that's a bug with how dplyr interfaces with data.table)

dplyr 是什么方式实现上述?

What's the dplyr way of achieving the above?

推荐答案

b个表:

In the current development version of dplyr (which will eventuallybecome dplyr 0.2) the behaviour differs between data frames and datatables:

library(dplyr)
library(data.table)

df <- data.frame(a = 1:2, b = c(1,2,NA,NA))
dt <- data.table(df)

df %.% group_by(a) %.% mutate(b = b[1])

## Source: local data frame [4 x 2]
## Groups: a
##
##   a b
## 1 1 1
## 2 2 2
## 3 1 1
## 4 2 2

dt %.% group_by(a) %.% mutate(b = b[1])

## Source: local data table [4 x 2]
## Groups: a
##
##   a b
## 1 1 1
## 2 1 1
## 3 2 2
## 4 2 2

这是因为 group_by()应用于 data.table
自动将 setkey()假设索引将
未来操作速度更快。

This happens because group_by() applied to a data.tableautomatically does setkey() on the assumption that the index will makefuture operations faster.

如果有强烈的感觉,这是一个不良的默认值,我很乐意改变它。

If there's a strong feeling that this is a bad default, I'm happy to change it.

这篇关于如何在dplyr突变而不会失去顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 13:08