本文介绍了比较某行的值与data.table中一定数量的先前行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是此

This is an extension of this question asked before.

在包含公司和类别值的数据库中,我要计算以下内容:
如果公司输入了新的类别,如果它以前未曾参与过 Three(3) 前几年(不包括同一年),则该条目将被标记为 NEW,否则它将是

In a database containing firm and category values, I want to calculate this:If a firm enters into a new category that it has not been previously engaged in Three(3) previous years (not including the same year), then that entry is labeld as "NEW", otherwise it will be labeld as "OLD".

在以下数据集中:

df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984),
                 category = c("A","A","B","C","A","D","F","F","C","A","B"))

期望的结果将是:

 year category Newness
 1: 1979        A     NEW
 2: 1979        A     NEW
 3: 1980        B     NEW
 4: 1980        C     NEW
 5: 1981        A     NEW
 6: 1981        D     NEW
 7: 1982        F     NEW
 8: 1983        F     OLD
 9: 1983        C     OLD
10: 1984        A     OLD
11: 1984        B     NEW

非常感谢。

推荐答案

这里有一些选择。

1) mult

df[, yrsago := year - 3L]
df[, Newness := 
    c("OLD", "NEW")[1L + df[df, on=.(category, year>=yrsago, year<year), mult="first", is.na(x.category)]]
]

2)通过 使用非等价自我联接:

2) Using non-equi self join with by=.EACHI:

df[, yrsago := year - 3L]
df[, Newness2 := 
    c("OLD", "NEW")[1L + df[df, on=.(category, year>=yrsago, year<year), by=.EACHI, .N==0L]$V1]
]

3)使用应该是最快的滚动联接

3) Using a rolling join which should be the fastest

df[, q := year - 0.1]
df[, Newness3 := 
    df[df, on=.(category, year=q), roll=3L, fifelse(is.na(x.year), "NEW", "OLD")]
]

输出:

    year category yrsago Newness Newness2      q Newness3
 1: 1979        A   1976     NEW      NEW 1978.9      NEW
 2: 1979        A   1976     NEW      NEW 1978.9      NEW
 3: 1980        B   1977     NEW      NEW 1979.9      NEW
 4: 1980        C   1977     NEW      NEW 1979.9      NEW
 5: 1981        A   1978     OLD      OLD 1980.9      OLD
 6: 1981        D   1978     NEW      NEW 1980.9      NEW
 7: 1982        F   1979     NEW      NEW 1981.9      NEW
 8: 1983        F   1980     OLD      OLD 1982.9      OLD
 9: 1983        C   1980     OLD      OLD 1982.9      OLD
10: 1984        A   1981     OLD      OLD 1983.9      OLD
11: 1984        B   1981     NEW      NEW 1983.9      NEW

数据:

df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984),
    category = c("A","A","B","C","A","D","F","F","C","A","B"))

这篇关于比较某行的值与data.table中一定数量的先前行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!