我有一个数据框“DFrame”,如下所示:
RecordNo | Cust_ID | Record_Date
1 | 023 | 2014-03-01
2 | 056 | 2014-01-18
3 | 041 | 2014-03-04
4 | 023 | 2014-03-21
5 | 056 | 2014-01-25
6 | 003 | 2014-03-01
7 | 023 | 2014-04-01
8 | 023 | 2014-04-02
我想添加一列,显示在当前 record_date 的接下来 14 天内同一客户 ID 的记录出现的次数。
RecordNo | Cust_ID | Record_Date | 14-day_Repeat_Count
1 | 023 | 2014-03-01 | 0
2 | 056 | 2014-01-18 | 1
3 | 041 | 2014-03-04 | 0
4 | 023 | 2014-03-21 | 2
5 | 056 | 2014-01-25 | 0
6 | 003 | 2014-03-01 | 0
7 | 023 | 2014-04-01 | 1
8 | 023 | 2014-04-02 | 0
我正在尝试用 R 编写快速代码来实现这一点。我找了几篇文章,让满足条件的记录计数看起来很容易,但它们通常只指向静态条件或与当前记录的值无关的条件:
http://one-line-it.blogspot.ca/2013/01/r-number-of-rows-matching-condition.html
我想逻辑可能如下所示:
# Sort DFRAME by RECORD_DATE decreasing=FALSE
......
# Count records below current record where that have matching Cust_ID
# AND the Record_Date is <= 14 days of the current Record_Date
# put result into DFrame$14-day_Repeat_Count
......
我在 DAX 中完成了这种类型的逻辑:
=calculate(counta([Cust_ID],
filter(DFrame,
[Cust_ID]=Earlier([Cust_ID]) &&
[Record_Date] > Earlier([Record_Date]) &&
[Record_Date] <= (Earlier([Record_Date]) + 14)
)
)
(非常快,但 Microsoft 专有),并且在 Excel 中使用“CountIfs”(易于实现,非常非常慢,并且再次需要与 Microsoft 联姻),任何人都可以向我指出一些关于如何根据 R 中的标准进行计数的阅读 Material ?
最佳答案
也许更快、更高效的答案可能如下所示:
##Combine into one data.table
library("data.table")
RecordNo <- 1:36
Record_Date <- c(31,33,38,41,44,59,68,69,75,78,85,88,
32,34,45,46,51,54,60,65,67,70,74,80,
33,35,42,45,50,60,65,70,75,80,82,85)
Cust_ID <- c(rep(1,12),rep(2,12),rep(3,12))
data <- data.table(Cust_ID,Record_Date)[order(Cust_ID,Record_Date)]
##Assign each customer an number that ranks them
data[,Cust_No:=.GRP,by=c("Cust_ID")]
##Create "list" of comparison dates for each customer
Ref <- data[,list(Compare_Date=list(I(Record_Date))), by=c("Cust_ID")]
##Compare two lists and see of the compare date is within N days
system.time(
data$Roll.Cnt <- mapply(FUN = function(RD, NUM) {
d <- as.numeric(Ref$Compare_Date[[NUM]] - RD)
sum((d > 0 & d <= 14))
}, RD = data$Record_Date,NUM=data$Cust_No)
)
结果数据如下所示:
data <- data[,list(Cust_ID,Record_Date,Roll.Cnt)][order(Cust_ID,Record_Date)]
data
Cust_ID Record_Date Roll.Cnt
1: 1 31 4
2: 1 33 3
3: 1 38 2
4: 1 41 1
5: 1 44 0
6: 1 59 2
7: 1 68 3
8: 1 69 2
9: 1 75 3
10: 1 78 2
11: 1 85 1
12: 1 88 0
13: 2 32 3
14: 2 34 2
15: 2 45 3
关于R - 计算数据框中有多少行具有相同的值并且日期在 x 天内,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23093532/