r - 算法效率-时差循环

我有一个名为vistsPerDay的数据集，它看起来像这样，但是有405890行和10406个唯一的客户ID：

> CUST_ID   Date
> 1         2013-09-19
> 1         2013-10-03
> 1         2013-10-08
> 1         2013-10-12
> 1         2013-10-20
> 1         2013-10-25
> 1         2013-11-01
> 1         2013-11-02
> 1         2013-11-08
> 1         2013-11-15
> 1         2013-11-23
> 1         2013-12-02
> 1         2013-12-04
> 1         2013-12-09
> 2         2013-09-16
> 2         2013-09-17
> 2         2013-09-18

我想做的是创建一个新变量，这是他们访问日期之间的滞后差异。以下是我当前使用的代码：

visitsPerDay <- visitsPerDay[order(visitsPerDay$CUST_ID), ]
cust_id <- 0
for (i in 1:nrow(visitsPerDay)) {
  if (visitsPerDay$CUST_ID[i] != cust_id) {
    cust_id <- visitsPerDay$CUST_ID[i]
    visitsPerDay$MTBV <- NA
  } else {
    visitsPerDay$MBTV <- as.numeric(visitsPerDay$Date[i] - visitsPerDay$Date[i-1])
  }
}

我觉得这样做肯定不是最有效的方法。有没有更好的方法来接近它谢谢！

最佳答案

这里有一个tapply的方法：

# transform 'Date' to values of class 'Date' (maybe already done)
visitsPerDay$Date <- as.Date(visitsPerDay$Date)

visitsPerDay <- transform(visitsPerDay,
                          MBTV = unlist(tapply(Date,
                                               CUST_ID,
                                               FUN = function(x) c(NA,diff(x)))))

结果是：

    CUST_ID       Date MBTV
11        1 2013-09-19   NA
12        1 2013-10-03   14
13        1 2013-10-08    5
14        1 2013-10-12    4
15        1 2013-10-20    8
16        1 2013-10-25    5
17        1 2013-11-01    7
18        1 2013-11-02    1
19        1 2013-11-08    6
110       1 2013-11-15    7
111       1 2013-11-23    8
112       1 2013-12-02    9
113       1 2013-12-04    2
114       1 2013-12-09    5
21        2 2013-09-16   NA
22        2 2013-09-17    1
23        2 2013-09-18    1

编辑：更快的方法：

# transform 'Date' to values of class 'Date' (maybe already done)
visitsPerDay$Date <- as.Date(visitsPerDay$Date)

visitsPerDay$MBTV <- c(NA_integer_,
                       "is.na<-"(diff(visitsPerDay$Date),
                                 !duplicated(visitsPerDay$CUST_ID)[-1]))

关于r - 算法效率-时差循环，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/21189073/