问题描述
我有一个带有字段 {id, menuitem, amount} 的 data.table.
I have a data.table with fields {id, menuitem, amount}.
这是交易数据 - 因此,id 是唯一的,但 menuitem 会重复.现在,我想删除 menuitem == 'coffee'
的所有条目.
This is transaction data - so, ids are unique, but menuitem repeats. Now, I want to remove all entries where menuitem == 'coffee'
.
另外,想删除 amount <= 0
;
在 data.table 中执行此操作的正确方法是什么?
What is the right way to do this in data.table?
我可以使用 data$menuitem!='coffee'
然后将 int 索引到 data[] - 但这不一定有效并且没有利用 data.table.
I can use data$menuitem!='coffee'
and then index int into data[] - but that is not necessarily efficient and does not take advantage of data.table.
感谢任何指向正确方向的指针.
Any pointers in the right direction are appreciated.
推荐答案
在这种情况下,它与 data.frame
data <- data[ menuitem != 'coffee' | amount > 0]
通过引用删除/添加行将被实现.您可以在 this问题
Delete/add row by reference it is to be implemented. You find more info in this question
关于速度:
1 您可以通过执行以下操作从密钥中受益:
1 You can benefit from keys by doing something like:
setkey(data, menuitem)
data <- data[!"coffee"]
这将比 data <- data[ menuitem != 'coffee']
更快.但是,要应用您在问题中提出的相同过滤器,您需要滚动加入(我已经完成了午休时间,我可以稍后添加一些东西:-)).
which will be faster than data <- data[ menuitem != 'coffee']
. However to apply the same filters you asked in the question you'll need a rolling join (I've finished my lunch break I can add something later :-)).
2 即使没有 key data.table 对于相对较大的表来说也更快(对于少数行的速度相似)
2 Even without key data.table is much faster for relatively big table (similar speed for handful amount of rows)
dt<-data.table(id=sample(letters,1000000,T),var=rnorm(1000000))
df<-data.frame(id=sample(letters,1000000,T),var=rnorm(1000000))
library(microbenchmark)
> microbenchmark(dt[ id == "a"], df[ df$id == "a",])
Unit: milliseconds
expr min lq median uq max neval
dt[id == "a"] 24.42193 25.74296 26.00996 26.35778 27.36355 100
df[df$id == "a", ] 138.17500 146.46729 147.38646 149.06766 154.10051 100
这篇关于从 R 中的 data.table 有条件地删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!