本文介绍了查找数据框中每个元素所属的区间行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有一个数字元素的向量,以及一个数据框,它有两列,定义了间隔的起点和终点。数据帧中的每一行都是一个间隔。我想知道向量属性中的每个元素的间隔。 以下是一些示例数据: #查找矢量的每个元素的哪个间隔在 库(tidyverse)元素< - c(0.1,0.2,0.5,0.9 ,1.1,1.9,2.1) 间隔< - frame_data(〜phase,〜start,〜end,a,0,0.5,b 1,1.9,c,2,2.5) 那些反对tidyverse的人: 元素 间隔< - 结构(list(phase = c(a,b,c), start = c(0,1,2) b $ b end = c(0.5,1.9,2.5)), .Names = c(phase,start,end), row.name s = c(NA,-3L), class =data.frame) 这里有一种方法: 库(intrval) phases_for_elements< - map元素,〜.x%[]%data.frame(spacing [,c('start','end')]))%>% map(。,〜unlist(spacing [.x,阶段'])) 以下是输出: [[1]] 阶段a [[2]] 阶段a [[3]] 阶段a [[4]] 字符(0) [[5]] 阶段b [[6]] 阶段b [[7]] 阶段c 但是,我正在寻找一种更简单的方法,打字较少。我在相关问题中看到 findInterval ,但我不知道在这种情况下如何使用它。 解决方案这是一个可能的解决方案,使用新的非Equi 连接 data.table (v> = 1.9.8)。虽然我怀疑你会喜欢语法,但它应该是非常有效的解决方案。 此外,关于 findInterval ,此功能假定您的间隔的连续性,而这不是这种情况,所以我怀疑有一个直接的解决方案使用它。 库(data.table)#v1.10.0 setDT (间隔)[data.table(elements),on =。(start< = elements,end> = elements)] #phase start end #1:a 0.1 0.1 #2:a 0.2 0.2 #3:a 0.5 0.5 #4:NA 0.9 0.9 #5:b 1.1 1.1 #6:b 1.9 1.9 #7:c 2.1 2.1 关于上面的代码,我觉得很简单:加入间隔和元素通过操作符中指定的条件。几乎是这样 这里有一些注意事项,开始,结束和元素应该是一样的,所以如果其中一个是 integer ,那么应该被转换至 numeric 。 I have a vector of numeric elements, and a dataframe with two columns that define the start and end points of intervals. Each row in the dataframe is one interval. I want to find out which interval each element in the vector belongs to.Here's some example data:# Find which interval that each element of the vector belongs in library(tidyverse) elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1) intervals <- frame_data(~phase, ~start, ~end, "a", 0, 0.5, "b", 1, 1.9, "c", 2, 2.5)The same example data for those who object to the tidyverse:elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)intervals <- structure(list(phase = c("a", "b", "c"), start = c(0, 1, 2), end = c(0.5, 1.9, 2.5)), .Names = c("phase", "start", "end"), row.names = c(NA, -3L), class = "data.frame")Here's one way to do it: library(intrval) phases_for_elements <- map(elements, ~.x %[]% data.frame(intervals[, c('start', 'end')])) %>% map(., ~unlist(intervals[.x, 'phase']))Here's the output: [[1]] phase "a" [[2]] phase "a" [[3]] phase "a" [[4]] character(0) [[5]] phase "b" [[6]] phase "b" [[7]] phase "c"But I'm looking for a simpler method with less typing. I've seen findInterval in related questions, but I'm not sure how I can use it in this situation. 解决方案 Here's a possible solution using the new "non-equi" joins in data.table (v>=1.9.8). While I doubt you'll like the syntax, it should be very efficient soluion.Also, regarding findInterval, this function assumes continuity in your intervals, while this isn't the case here, so I doubt there is a straightforward solution using it.library(data.table) #v1.10.0setDT(intervals)[data.table(elements), on = .(start <= elements, end >= elements)]# phase start end# 1: a 0.1 0.1# 2: a 0.2 0.2# 3: a 0.5 0.5# 4: NA 0.9 0.9# 5: b 1.1 1.1# 6: b 1.9 1.9# 7: c 2.1 2.1Regarding the above code, I find it pretty self-explanatory: Join intervals and elements by the condition specified in the on operator. That's pretty much it.There is a certain caveat here though, start, end and elements should be all of the same type, so if one of them is integer, it should be converted to numeric first. 这篇关于查找数据框中每个元素所属的区间行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-20 10:52
查看更多