问题描述
看下面的例子
library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))
Source: local data frame [25 x 3]
time values day_of_week
<date> <dbl> <fctr>
1 2014-02-24 30 Mon
2 2014-02-25 45 Tues
3 2014-02-26 30 Wed
4 2014-02-27 50 Thurs
5 2014-02-28 50 Fri
6 2014-03-01 20 Sat
7 2014-03-02 35 Sun
8 2014-03-03 50 Mon
9 2014-03-04 35 Tues
10 2014-03-05 35 Wed
我想按周汇总此数据框.
I would like to aggregate this dataframe by week.
也就是说,假设我将一周定义为从周一早上开始到周日晚上结束,我们将其称为 Monday to Monday
循环.(重要的是,我希望能够选择其他约定,例如周五到周五).
That is, suppose I define a week as starting on Monday morning and ending on Sunday evening, which we will call a Monday to Monday
cycle. (importantly, I want to be able to choose other conventions, such as Friday to Friday for instance).
然后,我只想计算每周 values
的平均值.
Then, I would simply like to count the mean of values
for each week.
例如,在上面的示例中,可以计算 2 月 24 日星期一到 3 月 2 日星期日之间 values
的平均值,依此类推.
For instance, in the example above, one would compute the average of values
between Monday February 24th to Sunday March 2nd, and so on.
我该怎么做?
推荐答案
就这一次,经过一番研究,我实际上想出了一个更好的解决方案
Just this once, after some research, I actually think I came up with a better solution that
- 给出正确的聚合
- 给出正确的标签
以下示例从星期四开始.周将按给定周期的第一天标记.
Example below for weeks starting on a thursday. The weeks will be labeled by their first day a given cycle.
library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
day_of_week = wday(time, label = FALSE))
df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
tmp_1 = (as.integer(day_of_week) - 5),
tmp_2 = ((as.integer(day_of_week) - 5) %% 7))
给出
> df2
# A tibble: 25 × 7
time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
<date> <dbl> <ord> <dbl> <date> <dbl> <dbl>
1 2014-02-24 30 Mon 2 2014-02-20 -3 4
2 2014-02-25 45 Tues 3 2014-02-20 -2 5
3 2014-02-26 30 Wed 4 2014-02-20 -1 6
4 2014-02-27 50 Thurs 5 2014-02-27 0 0
5 2014-02-28 50 Fri 6 2014-02-27 1 1
6 2014-03-01 20 Sat 7 2014-02-27 2 2
7 2014-03-02 35 Sun 1 2014-02-27 -4 3
8 2014-03-03 50 Mon 2 2014-02-27 -3 4
9 2014-03-04 35 Tues 3 2014-02-27 -2 5
10 2014-03-05 35 Wed 4 2014-02-27 -1 6
11 2014-03-06 50 Thurs 5 2014-03-06 0 0
12 2014-03-07 35 Fri 6 2014-03-06 1 1
13 2014-03-08 40 Sat 7 2014-03-06 2 2
14 2014-03-09 40 Sun 1 2014-03-06 -4 3
15 2014-03-10 20 Mon 2 2014-03-06 -3 4
16 2014-03-11 50 Tues 3 2014-03-06 -2 5
17 2014-03-12 25 Wed 4 2014-03-06 -1 6
18 2014-03-13 20 Thurs 5 2014-03-13 0 0
19 2014-03-14 30 Fri 6 2014-03-13 1 1
20 2014-03-15 50 Sat 7 2014-03-13 2 2
21 2014-03-16 50 Sun 1 2014-03-13 -4 3
22 2014-03-17 40 Mon 2 2014-03-13 -3 4
23 2014-03-18 40 Tues 3 2014-03-13 -2 5
24 2014-03-19 50 Wed 4 2014-03-13 -1 6
25 2014-03-20 40 Thurs 5 2014-03-20 0 0
这篇关于如何按周聚合数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!