问题描述
我有一组动物,它们的采样间隔不同.我想做的是采样间隔符合特定条件的组和序列(例如,低于特定值).让我用一些虚拟数据进行说明:
I have a set of animal locations with different sampling intervals. What I want to do is group and seqences where the sampling interval matches a certain criteria (e.g. is below a certain value). Let me illustrate with some dummy data:
start <- Sys.time()
timediff <- c(rep(5,3),20,rep(5,2))
timediff <- cumsum(timediff)
# Set up a dataframe with a couple of time values
df <- data.frame(TimeDate = start + timediff)
# Calculate the time differences between the rows
df$TimeDiff <- c(as.integer(tail(df$TimeDate,-1) - head(df$TimeDate,-1)),NA)
# Define a criteria in order to form groups
df$TimeDiffSmall <- df$TimeDiff <= 5
TimeDate TimeDiff TimeDiffSmall
1 2016-03-15 23:11:49 5 TRUE
2 2016-03-15 23:11:54 5 TRUE
3 2016-03-15 23:11:59 20 FALSE
4 2016-03-15 23:12:19 5 TRUE
5 2016-03-15 23:12:24 5 TRUE
6 2016-03-15 23:12:29 NA NA
在该伪数据中,行1:3属于一组,因为它们之间的时间差是< = 5秒. 4-6属于第二组,但假设两个组之间可能有许多行不属于任何组(TimeDiffSmall
等于FALSE
).
In this dummy data, rows 1:3 belong to one group, since the time difference between them is <= 5 seconds. 4 - 6 belong to the second group, but hypothetically there could be a number of rows in between the two groups that dont belong to any group (TimeDiffSmall
equals to FALSE
).
结合来自两个多个SO答案的信息(例如第1部分),我创建了一个可以解决此问题的函数问题.
Combining the information from two multiple SO answers (e.g. part 1), I've create a function that solves this problem.
number.groups <- function(input){
# part 1: numbering successive TRUE values
input[is.na(input)] <- F
x.gr <- ifelse(x <- input == TRUE, cumsum(c(head(x, 1), tail(x, -1) - head(x, -1) == 1)),NA)
# part 2: including last value into group
items <- which(!is.na(x.gr))
items.plus <- c(1,items+1)
sel <- !(items.plus %in% items)
sel.idx <- items.plus[sel]
x.gr[sel.idx] <- x.gr[sel.idx-1]
return(x.gr)
# Apply the function to create groups
df$Group <- number.groups(df$TimeDiffSmall)
TimeDate TimeDiff TimeDiffSmall Group
1 2016-03-15 23:11:49 5 TRUE 1
2 2016-03-15 23:11:54 5 TRUE 1
3 2016-03-15 23:11:59 20 FALSE 1
4 2016-03-15 23:12:19 5 TRUE 2
5 2016-03-15 23:12:24 5 TRUE 2
6 2016-03-15 23:12:29 NA NA 2
此功能实际上可以解决我的问题.这就是,这似乎是一种疯狂而新手的方式.有功能可以更专业地解决我的问题吗?
This function actually works to solve my problem. This this is, it seems like a crazy and rookie way to go about this. Is there a function that could solve my problem more professionally?
推荐答案
就像@thelatemail一样,我将使用以下内容获取组ID.之所以起作用,是因为cumsum()
每次到达元素前的间隔大于5秒,最终都会增加组计数.
Like @thelatemail, I'd use the following to get the group IDs. It works because cumsum()
will end up incrementing the group count each time it reaches an element preceded by a greater-than-5-second time interval.
df$Group <- cumsum(c(TRUE, diff(df$TimeDate) > 5))
df$Group
# [1] 1 1 1 2 2 2
这篇关于根据R中的行差异对行进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!