本文介绍了比较某行的值与data.table中一定数量的先前行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
This is an extension of this question asked before.
在包含公司和类别值的数据库中,我要计算以下内容:
如果公司输入了新的类别,如果它以前未曾参与过 Three(3) 前几年(不包括同一年),则该条目将被标记为 NEW,否则它将是
In a database containing firm and category values, I want to calculate this:If a firm enters into a new category that it has not been previously engaged in Three(3) previous years (not including the same year), then that entry is labeld as "NEW", otherwise it will be labeld as "OLD".
在以下数据集中:
df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984),
category = c("A","A","B","C","A","D","F","F","C","A","B"))
期望的结果将是:
year category Newness
1: 1979 A NEW
2: 1979 A NEW
3: 1980 B NEW
4: 1980 C NEW
5: 1981 A NEW
6: 1981 D NEW
7: 1982 F NEW
8: 1983 F OLD
9: 1983 C OLD
10: 1984 A OLD
11: 1984 B NEW
非常感谢。
推荐答案
这里有一些选择。
1)与 mult
df[, yrsago := year - 3L]
df[, Newness :=
c("OLD", "NEW")[1L + df[df, on=.(category, year>=yrsago, year<year), mult="first", is.na(x.category)]]
]
2)通过 使用非等价自我联接:
2) Using non-equi self join with by=.EACHI
:
df[, yrsago := year - 3L]
df[, Newness2 :=
c("OLD", "NEW")[1L + df[df, on=.(category, year>=yrsago, year<year), by=.EACHI, .N==0L]$V1]
]
3)使用应该是最快的滚动联接
3) Using a rolling join which should be the fastest
df[, q := year - 0.1]
df[, Newness3 :=
df[df, on=.(category, year=q), roll=3L, fifelse(is.na(x.year), "NEW", "OLD")]
]
输出:
year category yrsago Newness Newness2 q Newness3
1: 1979 A 1976 NEW NEW 1978.9 NEW
2: 1979 A 1976 NEW NEW 1978.9 NEW
3: 1980 B 1977 NEW NEW 1979.9 NEW
4: 1980 C 1977 NEW NEW 1979.9 NEW
5: 1981 A 1978 OLD OLD 1980.9 OLD
6: 1981 D 1978 NEW NEW 1980.9 NEW
7: 1982 F 1979 NEW NEW 1981.9 NEW
8: 1983 F 1980 OLD OLD 1982.9 OLD
9: 1983 C 1980 OLD OLD 1982.9 OLD
10: 1984 A 1981 OLD OLD 1983.9 OLD
11: 1984 B 1981 NEW NEW 1983.9 NEW
数据:
df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984),
category = c("A","A","B","C","A","D","F","F","C","A","B"))
这篇关于比较某行的值与data.table中一定数量的先前行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!