我有一个数据框“DFrame”,如下所示:

RecordNo |  Cust_ID  |  Record_Date
1        |  023      |  2014-03-01
2        |  056      |  2014-01-18
3        |  041      |  2014-03-04
4        |  023      |  2014-03-21
5        |  056      |  2014-01-25
6        |  003      |  2014-03-01
7        |  023      |  2014-04-01
8        |  023      |  2014-04-02

我想添加一列,显示在当前 record_date 的接下来 14 天内同一客户 ID 的记录出现的次数。
RecordNo |  Cust_ID  |  Record_Date  | 14-day_Repeat_Count
1        |  023      |  2014-03-01   | 0
2        |  056      |  2014-01-18   | 1
3        |  041      |  2014-03-04   | 0
4        |  023      |  2014-03-21   | 2
5        |  056      |  2014-01-25   | 0
6        |  003      |  2014-03-01   | 0
7        |  023      |  2014-04-01   | 1
8        |  023      |  2014-04-02   | 0

我正在尝试用 R 编写快速代码来实现这一点。我找了几篇文章,让满足条件的记录计数看起来很容易,但它们通常只指向静态条件或与当前记录的值无关的条件:
http://one-line-it.blogspot.ca/2013/01/r-number-of-rows-matching-condition.html

我想逻辑可能如下所示:
# Sort DFRAME by RECORD_DATE decreasing=FALSE
......
# Count records below current record where that have matching Cust_ID
# AND the Record_Date is <= 14 days of the current Record_Date

# put result into DFrame$14-day_Repeat_Count
......

我在 DAX 中完成了这种类型的逻辑:
=calculate(counta([Cust_ID],
      filter(DFrame,
             [Cust_ID]=Earlier([Cust_ID]) &&
             [Record_Date] > Earlier([Record_Date]) &&
             [Record_Date] <= (Earlier([Record_Date]) + 14)
      )
)

(非常快,但 Microsoft 专有),并且在 Excel 中使用“CountIfs”(易于实现,非常非常慢,并且再次需要与 Microsoft 联姻),任何人都可以向我指出一些关于如何根据 R 中的标准进行计数的阅读 Material ?

最佳答案

也许更快、更高效的答案可能如下所示:

##Combine into one data.table
library("data.table")
RecordNo <- 1:36
Record_Date <- c(31,33,38,41,44,59,68,69,75,78,85,88,
                 32,34,45,46,51,54,60,65,67,70,74,80,
                 33,35,42,45,50,60,65,70,75,80,82,85)
Cust_ID <- c(rep(1,12),rep(2,12),rep(3,12))
data <- data.table(Cust_ID,Record_Date)[order(Cust_ID,Record_Date)]

##Assign each customer an number that ranks them
data[,Cust_No:=.GRP,by=c("Cust_ID")]

##Create "list" of comparison dates for each customer
Ref <- data[,list(Compare_Date=list(I(Record_Date))), by=c("Cust_ID")]

##Compare two lists and see of the compare date is within N days
system.time(
  data$Roll.Cnt <- mapply(FUN = function(RD, NUM) {
    d <- as.numeric(Ref$Compare_Date[[NUM]] - RD)
    sum((d > 0 & d <= 14))
  }, RD = data$Record_Date,NUM=data$Cust_No)
)

结果数据如下所示:
data <- data[,list(Cust_ID,Record_Date,Roll.Cnt)][order(Cust_ID,Record_Date)]
data

     Cust_ID   Record_Date  Roll.Cnt
 1:       1          31        4
 2:       1          33        3
 3:       1          38        2
 4:       1          41        1
 5:       1          44        0
 6:       1          59        2
 7:       1          68        3
 8:       1          69        2
 9:       1          75        3
10:       1          78        2
11:       1          85        1
12:       1          88        0
13:       2          32        3
14:       2          34        2
15:       2          45        3

关于R - 计算数据框中有多少行具有相同的值并且日期在 x 天内,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23093532/

10-11 23:45