问题描述
这是我的数据集的一个示例.我想每 10 秒根据时间(即 ts)计算 bin 平均值.您能否提供一些提示以便我继续?
Here is an example of my dataset. I want to calculate bin average based on time (i.e., ts) every 10 seconds. Could you please provide some hints so that I can carry on?
就我而言,我想平均每 10 秒的时间 (ts) 和 Var.例如,我会得到一个 Var 和 ts 从 0 到 10 秒的平均值;我将在 11 到 20 秒等范围内得到另一个 Var 和 ts 的平均值.
In my case, I want to average time (ts) and Var in every 10 seconds. For example, I will get an averaged value of Var and ts from 0 to 10 seconds; I will get another averaged value of Var and ts from 11 to 20 seconds, etc.
df = data.frame(ts = seq(1,100,by=0.5), Var = runif(199,1, 10))
我可以使用 R 中的任何函数或库来完成这项任务吗?
Any functions or libraries in R can I use for this task?
推荐答案
有很多方法可以计算分箱平均值:使用基础 aggregate
、by
、使用包dplyr
、data.table
、可能与 zoo
以及其他时间序列包...
There are many ways to calculate a binned average: with base aggregate
,by
, with the packages dplyr
, data.table
, probably with zoo
and surely other timeseries packages...
library(dplyr)
df %>%
group_by(interval = round(df$ts/10)*10) %>%
summarize(Var_mean = mean(Var))
# A tibble: 11 x 2
interval Var_mean
<dbl> <dbl>
1 0 4.561653
2 10 6.544980
3 20 6.110336
4 30 4.288523
5 40 5.339249
6 50 6.811147
7 60 6.180795
8 70 4.920476
9 80 5.486937
10 90 5.284871
11 100 5.917074
这就是 dplyr 方法,看看它和 data.table 是如何让你命名中间变量的,这样可以保持代码的干净和易读.
That's the dplyr approach, see how it and data.table let you name the intermediate variables, which keeps code clean and legible.
这篇关于基于定义的时间间隔 (bin) 的时间序列平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!