本文介绍了折叠具有重叠范围的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个带有开始和结束时间的 data.frame:
I have a data.frame with start and end time:
ranges<- data.frame(start = c(65.72000,65.72187, 65.94312,73.75625,89.61625),stop = c(79.72187,79.72375,79.94312,87.75625,104.94062))
> ranges
start stop
1 65.72000 79.72187
2 65.72187 79.72375
3 65.94312 79.94312
4 73.75625 87.75625
5 89.61625 104.94062
在此示例中,第 2 行和第 3 行中的范围完全在第 1 行开始"和第 4 行停止之间的范围内.因此,重叠范围 1-4 应折叠为一个范围:
In this example, the ranges in row 2 and 3 are entirely within the range between 'start' on row 1 and stop on row 4. Thus, the overlapping ranges 1-4 should be collapsed to one range:
> ranges
start stop
1 65.72000 87.75625
5 89.61625 104.94062
我试过这个:
mdat <- outer(ranges$start, ranges$stop, function(x,y) y > x)
mdat[upper.tri(mdat)|col(mdat)==row(mdat)] <- NA
mdat
现在我只需要弄清楚如何组合所有真实的,但不确定这是否是最好的方法
And now I just need to figure out how to combine all the true ones, but not sure if it's the best way to go
推荐答案
你可以试试这个:
library(dplyr)
ranges %>%
arrange(start) %>%
group_by(g = cumsum(cummax(lag(stop, default = first(stop))) < start)) %>%
summarise(start = first(start), stop = max(stop))
# A tibble: 2 × 3
# g start stop
# <int> <dbl> <dbl>
#1 0 65.72000 87.75625
#2 1 89.61625 104.94062
这篇关于折叠具有重叠范围的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!