查找开始和结束位置

查找开始和结束位置

本文介绍了查找开始和结束位置/运行的索引/连续值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:给定一个原子向量,找出向量中运行的开始和结束索引.

Problem: Given an atomic vector, find the start and end indices of runs in the vector.

带有运行的示例向量:

x = rev(rep(6:10, 1:5))
# [1] 10 10 10 10 10  9  9  9  9  8  8  8  7  7  6

来自 rle() 的输出:

rle(x)
# Run Length Encoding
#  lengths: int [1:5] 5 4 3 2 1
#  values : int [1:5] 10 9 8 7 6

所需的输出:

#   start end
# 1     1   5
# 2     6   9
# 3    10  12
# 4    13  14
# 5    15  15

基础 rle 类似乎没有提供此功能,但是 Rle 和函数 rle2 做.然而,考虑到该功能的重要性,坚持使用基础 R 似乎比安装和加载额外的包更明智.

The base rle class doesn't appear to provide this functionality, but the class Rle and function rle2 do. However, given how minor the functionality is, sticking to base R seems more sensible than installing and loading additional packages.

有代码片段示例(这里on SO) 解决了寻找开始和满足某些条件的运行的结束索引.我想要一些更通用的东西,可以在一行中执行,并且不涉及临时变量或值的分配.

There are examples of code snippets (here, here and on SO) which solve the slightly different problem of finding start and end indices for runs which satisfy some condition. I wanted something that would be more general, could be performed in one line, and didn't involve the assignment of temporary variables or values.

回答我自己的问题,因为我对缺乏搜索结果感到沮丧.我希望这对某人有所帮助!

Answering my own question because I was frustrated by the lack of search results. I hope this helps somebody!

推荐答案

核心逻辑:

# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)

# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)

# Display results
data.frame(start, end)
#   start end
# 1     1   5
# 2     6   9
# 3    10  12
# 4    13  14
# 5    15  15

Tidyverse/dplyr 方式(以数据帧为中心):

Tidyverse/dplyr way (data frame-centric):

library(dplyr)

rle(x) %>%
  unclass() %>%
  as.data.frame() %>%
  mutate(end = cumsum(lengths),
         start = c(1, dplyr::lag(end)[-1] + 1)) %>%
  magrittr::extract(c(1,2,4,3)) # To re-order start before end for display

因为 startend 向量与 rle 对象的 values 组件的长度相同,解决识别满足某些条件的运行的端点的相关问题很简单:filter 或使用运行条件对 startend 向量进行子集值.

Because the start and end vectors are the same length as the values component of the rle object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter or subset the start and end vectors using the condition on the run values.

这篇关于查找开始和结束位置/运行的索引/连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 16:15