问题描述
问题:给定一个原子向量,找出向量中运行的开始和结束索引.
Problem: Given an atomic vector, find the start and end indices of runs in the vector.
带有运行的示例向量:
x = rev(rep(6:10, 1:5))
# [1] 10 10 10 10 10 9 9 9 9 8 8 8 7 7 6
来自 rle()
的输出:
rle(x)
# Run Length Encoding
# lengths: int [1:5] 5 4 3 2 1
# values : int [1:5] 10 9 8 7 6
所需的输出:
# start end
# 1 1 5
# 2 6 9
# 3 10 12
# 4 13 14
# 5 15 15
基础 rle
类似乎没有提供此功能,但是 Rle
和函数 rle2
做.然而,考虑到该功能的重要性,坚持使用基础 R 似乎比安装和加载额外的包更明智.
The base rle
class doesn't appear to provide this functionality, but the class Rle
and function rle2
do. However, given how minor the functionality is, sticking to base R seems more sensible than installing and loading additional packages.
有代码片段示例(、这里 和 on SO) 解决了寻找开始和满足某些条件的运行的结束索引.我想要一些更通用的东西,可以在一行中执行,并且不涉及临时变量或值的分配.
There are examples of code snippets (here, here and on SO) which solve the slightly different problem of finding start and end indices for runs which satisfy some condition. I wanted something that would be more general, could be performed in one line, and didn't involve the assignment of temporary variables or values.
回答我自己的问题,因为我对缺乏搜索结果感到沮丧.我希望这对某人有所帮助!
Answering my own question because I was frustrated by the lack of search results. I hope this helps somebody!
推荐答案
核心逻辑:
# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)
# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)
# Display results
data.frame(start, end)
# start end
# 1 1 5
# 2 6 9
# 3 10 12
# 4 13 14
# 5 15 15
Tidyverse/dplyr
方式(以数据帧为中心):
Tidyverse/dplyr
way (data frame-centric):
library(dplyr)
rle(x) %>%
unclass() %>%
as.data.frame() %>%
mutate(end = cumsum(lengths),
start = c(1, dplyr::lag(end)[-1] + 1)) %>%
magrittr::extract(c(1,2,4,3)) # To re-order start before end for display
因为 start
和 end
向量与 rle
对象的 values
组件的长度相同,解决识别满足某些条件的运行的端点的相关问题很简单:filter
或使用运行条件对 start
和 end
向量进行子集值.
Because the start
and end
vectors are the same length as the values
component of the rle
object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter
or subset the start
and end
vectors using the condition on the run values.
这篇关于查找开始和结束位置/运行的索引/连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!