我在R中有以下数据框

ClientID <- c("c100","c100","c100","c100","c100","c100","c101","c101","c101",
          "c101","c102","c102","c102","c102","c102","c102","c103","c103",
          "c103","c103")

Month <- c("01","02","03","04","05","06","01","02","03","04",
      "01","02","03","04","05","06","01","02","03","04")

Returns <- c(23,0,0,12,0,11,0,0,345,234,123,0,0,23,0,22,34,0,44,21)

      ClientID Month Brokerage
 1      c100    01        23
 2      c100    02         0
 3      c100    03         0
 4      c100    04        12
 5      c100    05         0
 6      c100    06        11
 7      c101    01         0
 8      c101    02         0
 9      c101    03       345
 10     c101    04       234
 11     c102    01       123
 12     c102    02         0
 13     c102    03         0
 14     c102    04        23
 15     c102    05         0
 16     c102    06        22
 17     c103    01        34
 18     c103    02         0
 19     c103    03        44
 20     c103    04        21

 Final_data$Flag <- ifelse(Final_data$Brokerage > 0 ,0,1)

添加标志数据框后看起来像这样
     ClientID Month Brokerage Flag
1:     c100    01        23    0
2:     c100    02         0    1
3:     c100    03         0    1
4:     c100    04        12    0
5:     c100    05         0    1
6:     c100    06        11    0
7:     c101    01         0    1
8:     c101    02         0    1
9:     c101    03       345    0
10:     c101    04       234    0
11:     c102    01       123    0
12:     c102    02         0    1
13:     c102    03         0    1
14:     c102    04        23    0
15:     c102    05         0    1
16:     c102    06        22    0
17:     c103    01        34    0
18:     c103    02         0    1
19:     c103    03        44    0
20:     c103    04        21    0

我已将收益大于0的客户标记为0,如果他们本月未给出任何收益,则将其标记为1。我的目标是在客户级别找到零之间的1的总和。目的是检查客户是否处于休眠状态。

预期输出为
c100 2,1
c101 Null
c102 2,1
c103 1

逻辑是将零之间的1相加。
我可以使用以下代码获得整个列的两个零之间的1的和。
sum.between.zeroes <- function(x) {
 library(stringr)
 x.str <- paste(x, collapse = "")
 nchar(str_extract_all(x.str, "01+0")[[1]]) - 2L
}

sum.between.zeroes(Final_data$Flag)

2 2 2 1

以上输出是正确的,但我希望将其汇总到客户级别。
我曾尝试过dplyr,但似乎没有用。
test <- Final_data %>%
 group_by(ClientID) %>%
 summarise(Flags = sum.between.zeroes(Flag))

请帮忙。

最佳答案

这是base R版本。我们通过'ClientID'对'Brokerage'进行split编码,对'Brokerage'中位置不为0的range进行编码,对list中的元素进行子集化,使用rle获取每组0的length,并stack list创建一个data.frame

with(Final_data, stack(lapply(split(Brokerage, ClientID), function(x) {
        i1 <- range(which(x!=0))
      toString(with(rle(x[i1[1]:i1[2]]==0), lengths[values])) })))[2:1]
#   ind values
#1 c100   2, 1
#2 c101
#3 c102   2, 1
#4 c103      1

关于r - 如何找到两个零之间的数字总和并按R中的特定列分组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41149190/

10-12 16:33