


I found a way for the problem below, however, it works on a small dataset but still creates falses output on large datasets. Someone knows why? I can't find the mistake. Here's the code:

df$continuous <-
  unlist(lapply(split(df, df$ID),
                function(x) {
                         function(y) {
                           any(x$start[y] - x$end[-(y:NROW(x$end))] <= 1)


ORIGINAL PROBLEM:I'm working on a function to identify a gap in a series of start/end dates. The output should be FALSE if a start date begins later than 1 day after any of the previous end dates.


df <- data.frame('ID' = c('1','1','1','1','1','1'), 'start' = as.Date(c('2010-01-01', '2010-01-03', '2010-01-05', '2010-01-09','2010-02-01', '2010-02-10')),
                 'end' = as.Date(c('2010-01-03', '2010-01-22', '2010-01-07', '2010-01-12', '2010-02-10', '2010-02-12')))

这是我尝试使用x = starty = end解决此问题的方法:

This is my attempt to solve this with x = start and y = end:

my_fun <- function(x,y){
  any(x[i] - y[1:NROW(i)-1] <= 1)


It works well if I specify i but I don't manage to wrap this into a loop. Ultimately, this function should be applied to groups in a large dataset in a dplyr manner.


  ID      start        end  continuous
1  1 2010-01-01 2010-01-03 FALSE #or TRUE
2  1 2010-01-03 2010-01-22 TRUE
3  1 2010-01-05 2010-01-07 TRUE
4  1 2010-01-09 2010-01-12 TRUE
5  1 2010-02-01 2010-02-10 FALSE
6  1 2010-02-10 2010-02-12 TRUE #according to my function or FALSE compared to start[1] would be even better



您可以使用dplyrlubridate进行此操作. dplyr具有非常有用的窗口功能lag()这类分析很方便.

You can do this using dplyr and lubridate. dplyr has really useful window functions like lag() that are handy for this type of analysis.


df %>%
  mutate(start - lag(end, 1) == 0)

# ID      start        end start - lag(end, 1) == 0
# 1  1 2010-01-01 2010-01-03                       NA
# 2  1 2010-01-03 2010-01-22                     TRUE
# 3  1 2010-01-05 2010-01-07                    FALSE
# 4  1 2010-01-09 2010-01-12                    FALSE
# 5  1 2010-02-01 2010-02-10                    FALSE
# 6  1 2010-02-10 2010-02-12                     TRUE


How do you want to handle the first row of your data? Since there is no previous value, it shows NA. This is generally how you should handle situations like this but I can edit my answer if you'd like it to have a different value.


07-22 23:53