问题描述
我有这种数据.
library(dplyr)
library(tidyverse)
df <- tibble(mydate = as.Date(c("2019-05-11 23:01:00", "2019-05-11 23:02:00", "2019-05-11 23:03:00", "2019-05-11 23:04:00",
"2019-05-12 23:05:00", "2019-05-12 23:06:00", "2019-05-12 23:07:00", "2019-05-12 23:08:00",
"2019-05-13 23:09:00", "2019-05-13 23:10:00", "2019-05-13 23:11:00", "2019-05-13 23:12:00",
"2019-05-14 23:13:00", "2019-05-14 23:14:00", "2019-05-14 23:15:00", "2019-05-14 23:16:00",
"2019-05-15 23:17:00", "2019-05-15 23:18:00", "2019-05-15 23:19:00", "2019-05-15 23:20:00")),
myval = c(0, NA, 1500, 1500,
1500, 1500, NA, 0,
0, 0, 1100, 1100,
1100, 0, 200, 200,
1100, 1100, 1100, 0
))
我想将每个相同的值除以它显示的计数.但是,如果在此数字(值1100)之间出现另一个数字(或NA),然后重新出现(值1100),我想将其分开.
I want to divide every same value with the counts that it appears. But, if between this number (value 1100) , another number (or NA) appears, and then re-appears (value 1100) , I want to count it separatable.
# just replace values [0,1] with NA
df$myval[df$myval >= 0 & df$myval <= 1] <- NA
df <- df %>%
group_by(myval) %>%
mutate(counts = sum(myval == myval)) %>%
mutate(result = (myval / counts))
现在的结果是:
mydate myval counts result
<date> <dbl> <int> <dbl>
1 2019-05-11 NA NA NA
2 2019-05-11 NA NA NA
3 2019-05-11 1500 4 375
4 2019-05-11 1500 4 375
5 2019-05-12 1500 4 375
6 2019-05-12 1500 4 375
7 2019-05-12 NA NA NA
8 2019-05-12 NA NA NA
9 2019-05-13 NA NA NA
10 2019-05-13 NA NA NA
11 2019-05-13 1100 6 183.
12 2019-05-13 1100 6 183.
13 2019-05-14 1100 6 183.
14 2019-05-14 NA NA NA
15 2019-05-14 200 2 100
16 2019-05-14 200 2 100
17 2019-05-15 1100 6 183.
18 2019-05-15 1100 6 183.
19 2019-05-15 1100 6 183.
20 2019-05-15 NA NA NA
但是正如您看到的,出现两次的1100值,它算了6次.我想先数3次,然后再数3次.
but as you cane see for the value 1100 that appears twice, it count it 6 times.I want to count it 3 times and then again 3 times.
例如,值1500出现4次,所以我除以1500/4.1100应该除以3,然后再除以3.
So, for example value 1500 appears 4 times, so I divide 1500/4.1100 should be divided by 3 and then again by 3.
推荐答案
您可以使用运行长度编码"(基本上是一个累积的总和,在看到另一个值时会重新开始)来做到这一点.
You can do that using Run Length Encoding (which is basically a cumulative sum that restarts when it sees another value).
rle(df$myval) %$%
tibble(rle = lengths,
myval = values,
avg = values / rle)
# A tibble: 10 x 3
# rle myval avg
# <int> <dbl> <dbl>
# 1 1 0 0
# 2 1 NA NA
# 3 4 1500 375
# 4 1 NA NA
# 5 3 0 0
# 6 3 1100 367.
# 7 1 0 0
# 8 2 200 100
# 9 3 1100 367.
# 10 1 0 0
这篇关于计算不同时间轴中的发生次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!