本文介绍了像分组数据的条件seq_along的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试产生观察的情节,将发现的地方分组在一起,彼此间隔14天。
用dplyr我设法计算自上次观察以来的天数。然而,我不知道如何根据条件< / = 14 获取一个新的id,而不需要的循环。



样本数据:



#obsvn是第一次观察后的天数在组

  dat<  -  data.frame(id = c(rep(A ,5),rep(B,2)),
obsvn = c(1,2,29,30,45,1,15))
id obsvn
1 A 1
2 A 2
3 A 29
4 A 30
5 A 45
6 B 1
7 B 15

预期输出:

  id obsvn ith 
1 A 1 1
2 A 2 1
3 A 29 2
4 A 30 2
5 A 45 3
6 B 1 1
7 B 15 2

我尝试使用滞后到

  dat<  -  dat%>%
group_by(id)%>%
mutate(ith = 1,
ith = ifelse(obsvn - lag(obsvn)< = 14,lag(ith),lag(ith)+1))
dat
来源:本地数据框[7 x 3]
组:id

id obsvn ith
1 A 1 NA
2 A 2 1
3 A 29 2
4 A 30 1
5 A 45 2
6 B 1 NA
7 B 15 1

哪些不是我想要的。我不明白为什么行4中的 ith 是1而不是2.

解决方案

因为它返回 lag(ith),始终为1(或开始为NA)。



我会使用 diff cumsum



%$($)$%$%$%$%$%$%$%$%$ b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b来源:本地数据框[7 x 3]
组:id

id obsvn ith
1 A 1 1
2 A 2 1
3 A 29 2
4 A 30 2
5 A 45 3
6 B 1 1
7 B 15 2


I'm trying to generate 'episodes' of observations, grouping together observations where they occur </= 14 days apart. With dplyr I've managed to calculate the number of days since the last observation. However, I cannot figure out how to get a new id based on the conditional </= 14 without a for loop.

Sample data:

#obsvn is number of days since first observation in group

dat <- data.frame(id = c(rep("A",5), rep("B", 2)), 
                  obsvn = c(1, 2, 29, 30, 45, 1, 15))
  id obsvn
1  A     1
2  A     2
3  A    29
4  A    30
5  A    45
6  B     1
7  B    15

Expected output:

  id obsvn ith
1  A     1    1
2  A     2    1
3  A    29    2
4  A    30    2
5  A    45    3
6  B     1    1
7  B    15    2

I've tried using lag to

dat <- dat %>% 
  group_by(id) %>% 
  mutate(ith = 1,
         ith = ifelse(obsvn - lag(obsvn) <= 14, lag(ith), lag(ith)+1))
dat
Source: local data frame [7 x 3]
Groups: id

  id obsvn ith
1  A     1  NA
2  A     2   1
3  A    29   2
4  A    30   1
5  A    45   2
6  B     1  NA
7  B    15   1

Which isn't what I want. I don't understand why ith in row 4 is 1 rather than 2.

解决方案

Because it is returning lag(ith), which is always 1 (or NA at the start).

I would do it using diff and cumsum:

dat %>% group_by(id) %>% mutate(ith = cumsum(c(1,diff(obsvn)>=14)))
Source: local data frame [7 x 3]
Groups: id

  id obsvn ith
1  A     1   1
2  A     2   1
3  A    29   2
4  A    30   2
5  A    45   3
6  B     1   1
7  B    15   2

这篇关于像分组数据的条件seq_along的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-25 03:13