本文介绍了如何比较R中的两行日期并将其转换为一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有一个数据集,其中包含多行相同名称的日期信息(间隔),应该将其进行比较并最终转换为一行。我想实现以下目标:I have a data set which contains multiple rows of date information (intervals) for the same names, which should be compared and eventually transformed into one row. I'd like to achieve the following: 如果间隔重叠,则保留四个值的最早和最晚日期的一行 li> 如果间隔不重叠,但是间隔之间的时间小于或等于60天,请执行与上述相同的操作:因此,将四个值的最早和最晚日期保留一行 如果间隔不重叠,并且间隔之间的时间超过60天,则不执行任何操作(保留两行)If the intervals are overlapping, then keep one row with the earliest and the latest date of the four valuesIf the intervals are not overlapping, but the time between intervals is less or equal to 60 days, do the same as above: thus, keep one row with the earliest and latest date of the four valuesIf the intervals are not overlapping, and the time between intervals is more than 60 days, do nothing (keep both rows)数据:names <- c("John", "John", "Rick", "Rick", "Katie", "Katie", "Harry", "Harry" )date1 <- c("1-3-2016", "18-5-2016", "13-1-2018", "4-2-2020", "5-1-2019", "29-1-2020", "27-8-2018", "4-2-2020")date2 <- c("16-4-2020", "13-2-2020", "2-3-2020", "16-2-2020", "25-2-2020", "10-4-2020", "27-6-2019", "8-4-2020")df1 <- data.frame(names,date1,date2)所需结果:names <- c("John", "Rick", "Katie", "Harry", "Harry")date1 <- c("1-3-2016", "13-1-2018", "5-1-2019", "27-8-2018", "4-2-2020")date2 <- c("16-4-2020", "16-4-2020", "10-4-2020", "27-6-2019", "8-4-2020")df2 <- data.frame(names,date1,date2)日期:df1$date1 <- as.Date(df1$date1, "%d-%m-%Y")df1$date2 <- as.Date(df1$date2, "%d-%m-%Y")推荐答案这是使用 dplyr 的一种方法(可能不是最简洁的方法)。首先,我们将日期转换为 Date 格式,然后针对每个名称Here's one way (probably not the most concise) using dplyr. First we convert the dates to Date format, then for each name 找出第二个间隔从第一个开始超过60天开始。如果是这样,我们将这两行标记为 keep_both 。我们对日期进行了排序,以便我们知道第二行稍后出现。 对于未标记为 keep_both 的行,获取最小值和最长日期。请注意,我假设间隔按正确的顺序排列,即在这里每一行中 date2 都晚于 date1 过滤数据以保留每个名称的第一行,除非我们同时保留两者。figure out if the second interval starts more than 60 days after the first. If so, we tag both rows as keep_both. We sorted the dates so we know the second row comes later.for rows that aren't marked keep_both, get the min and max dates. Note that I'm assuming the intervals are in the right order, i.e. date2 is later than date1 in each row here.filter the data to keep just the first row from each name unless we are keeping both.输出names <- c("John", "John", "Rick", "Rick", "Katie", "Katie", "Harry", "Harry")date1 <- c("1-3-2016", "18-5-2016", "13-1-2018", "4-2-2020", "5-1-2019", "29-1-2020", "27-8-2018", "4-2-2020")date2 <- c("16-4-2020", "13-2-2020", "2-3-2020", "16-2-2020", "25-2-2020", "10-4-2020", "27-6-2019", "8-4-2020")df1 <- data.frame(names, date1, date2)library(tidyverse)df1 %>% mutate(across(c(date1, date2), lubridate::dmy)) %>% arrange(names, date1, date2) %>% group_by(names) %>% mutate( keep_both = any((date1 - lag(date2)) > 60, na.rm = TRUE), new_date1 = if_else(keep_both, date1, min(date1)), new_date2 = if_else(keep_both, date2, max(date2)), ) %>% filter(keep_both | row_number() == 1) %>% select(names, date1 = new_date1, date2 = new_date2)#> # A tibble: 5 x 3#> # Groups: names [4]#> names date1 date2#> <chr> <date> <date>#> 1 Harry 2018-08-27 2019-06-27#> 2 Harry 2020-02-04 2020-04-08#> 3 John 2016-03-01 2020-04-16#> 4 Katie 2019-01-05 2020-04-10#> 5 Rick 2018-01-13 2020-03-02 这篇关于如何比较R中的两行日期并将其转换为一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-18 19:04