本文介绍了从重叠日期计算活动日/月的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有大量客户的不同产品的开始和结束日期的数据。不同产品的间隔可能与购买之间有重叠或有时间差距: 库(lubridate)库(Hmisc) 库(dplyr) user_id< - c(rep(12,8),rep(33,5)) start_date< - dmy (31/10/2010,18/12/2010,31/10/2011,18/12/2011,27/03/2014,18/12/2014,27/03/2015,18/12/2016, 01/07/1992,20/08/1993,28/10/1999,31/01/2006,26/08/2016)) end_date< - dmy(Cs(31/10 / 2011,18/12/2011,28/04/2014,18/12/2014,27/03/2015,18/12/2016,27/03/2016,18/12/2017, 01 / 07/2016,16/08/2016,15/11/2012,28/02/2006,26/01/2017)) data< - data.frame(user_id,start_date, end_date) data user_id start_date end_date 1 12 2010-10-31 2011-10-31 2 12 2010-12-18 2011-12-18 3 12 2011-10-31 2014-04-28 4 12 2011-12-18 2014-12-18 5 12 2014-03-27 2015-03-27 6 12 2014-12-18 2016-12-18 7 12 2015-03- 27 2016-03-27 8 12 2016-12-18 2017-12-18 9 33 1992-07-01 2016-07-01 10 33 1993-08-20 2016 -08-16 11 33 1999-10-28 2012-11-15 12 33 2006-01-31 2006-02-28 13 33 2016-08-26 2017-01 -26 我想计算活动天数或月份的总数他/她持有任何产品。 如果产品总是重叠就不会有问题,那么我可以简单地采取 data%>% group_by(user_id)%>% dplyr :: summarize(time_diff = max(end_date ) - min(start_date)) 但是,如您在用户33中可以看到的,产品不总是重叠,它们的间隔必须分别添加到所有重叠间隔。 有一种快速优雅的方式来编码,希望在 dplyr ?解决方案我们可以使用 dplyr 中的函数计算总天数。以下示例展开每个时间段,然后删除重复的日期。最后计算每个 user_id 的总行号。 data2 rowwise()%>% do(data_frame(user_id =。$ user_id, Date = seq(。$ start_date,$ end_date,by = 1 )))%>% distinct()%>% ungroup()%>% count(user_id) pre> I have data listing start and end dates for different products for a big number of customers. The intervals for different products can overlap or have time gaps between purchases:library(lubridate)library(Hmisc)library(dplyr)user_id <- c(rep(12, 8), rep(33, 5))start_date <- dmy(Cs(31/10/2010, 18/12/2010, 31/10/2011, 18/12/2011, 27/03/2014, 18/12/2014, 27/03/2015, 18/12/2016, 01/07/1992, 20/08/1993, 28/10/1999, 31/01/2006, 26/08/2016))end_date <- dmy(Cs(31/10/2011, 18/12/2011, 28/04/2014, 18/12/2014, 27/03/2015, 18/12/2016, 27/03/2016, 18/12/2017, 01/07/2016, 16/08/2016, 15/11/2012, 28/02/2006, 26/01/2017))data <- data.frame(user_id, start_date, end_date)data user_id start_date end_date1 12 2010-10-31 2011-10-312 12 2010-12-18 2011-12-183 12 2011-10-31 2014-04-284 12 2011-12-18 2014-12-185 12 2014-03-27 2015-03-276 12 2014-12-18 2016-12-187 12 2015-03-27 2016-03-278 12 2016-12-18 2017-12-189 33 1992-07-01 2016-07-0110 33 1993-08-20 2016-08-1611 33 1999-10-28 2012-11-1512 33 2006-01-31 2006-02-2813 33 2016-08-26 2017-01-26I'd like to calculate the total number of active days or months during which he/she held any the products.It wouldn't be a problem if the products ALWAYS overlapped as then I could simply take data %>%group_by(user_id) %>%dplyr::summarize(time_diff = max(end_date) - min(start_date))However, as you can see in user 33, products don't always overlap and their interval has to be added separately to all 'overlapped' intervals.Is there a quick and elegant way to code it, hopefully in dplyr? 解决方案 We can use functions from dplyr to count the total number of days. The following example expands each time period, and then removes duplicated dates. Finally count the total row number for each user_id.data2 <- data %>% rowwise() %>% do(data_frame(user_id = .$user_id, Date = seq(.$start_date, .$end_date, by = 1))) %>% distinct() %>% ungroup() %>% count(user_id) 这篇关于从重叠日期计算活动日/月的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-07 08:19