我在R中有以下数据框
ClientID <- c("c100","c100","c100","c100","c100","c100","c101","c101","c101",
"c101","c102","c102","c102","c102","c102","c102","c103","c103",
"c103","c103")
Month <- c("01","02","03","04","05","06","01","02","03","04",
"01","02","03","04","05","06","01","02","03","04")
Returns <- c(23,0,0,12,0,11,0,0,345,234,123,0,0,23,0,22,34,0,44,21)
ClientID Month Brokerage
1 c100 01 23
2 c100 02 0
3 c100 03 0
4 c100 04 12
5 c100 05 0
6 c100 06 11
7 c101 01 0
8 c101 02 0
9 c101 03 345
10 c101 04 234
11 c102 01 123
12 c102 02 0
13 c102 03 0
14 c102 04 23
15 c102 05 0
16 c102 06 22
17 c103 01 34
18 c103 02 0
19 c103 03 44
20 c103 04 21
Final_data$Flag <- ifelse(Final_data$Brokerage > 0 ,0,1)
添加标志数据框后看起来像这样
ClientID Month Brokerage Flag
1: c100 01 23 0
2: c100 02 0 1
3: c100 03 0 1
4: c100 04 12 0
5: c100 05 0 1
6: c100 06 11 0
7: c101 01 0 1
8: c101 02 0 1
9: c101 03 345 0
10: c101 04 234 0
11: c102 01 123 0
12: c102 02 0 1
13: c102 03 0 1
14: c102 04 23 0
15: c102 05 0 1
16: c102 06 22 0
17: c103 01 34 0
18: c103 02 0 1
19: c103 03 44 0
20: c103 04 21 0
我已将收益大于0的客户标记为0,如果他们本月未给出任何收益,则将其标记为1。我的目标是在客户级别找到零之间的1的总和。目的是检查客户是否处于休眠状态。
预期输出为
c100 2,1
c101 Null
c102 2,1
c103 1
逻辑是将零之间的1相加。
我可以使用以下代码获得整个列的两个零之间的1的和。
sum.between.zeroes <- function(x) {
library(stringr)
x.str <- paste(x, collapse = "")
nchar(str_extract_all(x.str, "01+0")[[1]]) - 2L
}
sum.between.zeroes(Final_data$Flag)
2 2 2 1
以上输出是正确的,但我希望将其汇总到客户级别。
我曾尝试过dplyr,但似乎没有用。
test <- Final_data %>%
group_by(ClientID) %>%
summarise(Flags = sum.between.zeroes(Flag))
请帮忙。
最佳答案
这是base R
版本。我们通过'ClientID'对'Brokerage'进行split
编码,对'Brokerage'中位置不为0的range
进行编码,对list
中的元素进行子集化,使用rle
获取每组0的length
,并stack
list
创建一个data.frame
with(Final_data, stack(lapply(split(Brokerage, ClientID), function(x) {
i1 <- range(which(x!=0))
toString(with(rle(x[i1[1]:i1[2]]==0), lengths[values])) })))[2:1]
# ind values
#1 c100 2, 1
#2 c101
#3 c102 2, 1
#4 c103 1
关于r - 如何找到两个零之间的数字总和并按R中的特定列分组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41149190/