问题描述
所以我有一个数据框,它有一个日期列、一个小时列和一系列其他数字列.数据框中的每一行是一整年的一天中的 1 小时.
So I have a data frame that has a date column, an hour column and a series of other numerical columns. Each row in the data frame is 1 hour of 1 day for an entire year.
数据框如下所示:
Date Hour Melbourne Southern Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610
时间乱序,因为这是从另一个数据框中提取的子集.
The hours are out of order because this is subsetted from another data frame.
我想按月和可能按天对数字列中的值求和.有谁知道我该怎么做?
I would like to sum the values in the numerical columns by month and possibly by day. Does anyone know how I can do this?
推荐答案
我创建的数据集
data <- read.table( text=" Date Hour Melbourne Southern Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610",
header=TRUE,stringsAsFactors=FALSE)
你可以用aggregate
函数求和:
byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
data=data,FUN=sum)
查看 ?aggregate
以更好地理解该函数.从最后一个参数开始(因为这使解释更容易),参数执行以下操作:
Look at ?aggregate
to understand the function better. Starting with the last argument (because that makes explaining easier) the arguments do the following:
FUN
是应该用于聚合的函数.我使用sum
来总结这些值,但我也可以是mean
、max
或你自己编写的一些函数.data
用于指示我要聚合的数据框.- 第一个参数告诉函数我到底想要聚合什么.在
~
的左边,我指明了我想要聚合的变量.如果有多个,它们会与cbind
组合在一起.右侧是数据应该被分割的变量.放置Date
意味着聚合将对Date
的每个不同值的变量求和.
FUN
is the function that should be used for the aggregation. I usesum
to sum up the values, but i could also bemean
,max
or some function you wrote yourself.data
is used to indicate that data frame that I want to aggregate.- The first argument tells the function what exactly I want to aggregate. On the left side of the
~
, I indicate the variables I want to aggregate. If there is more than one, they are combined withcbind
. On the right hand side is the variable by which the data should be split. PuttingDate
means that aggregate will sum up the variables for each distinct value ofDate
.
对于按月的聚合,我使用了 lubridate
包中的函数 month
.它做人们所期望的:它返回一个数值,指示给定日期的月份.也许你首先需要通过install.packages("lubridate")
安装包.
For the aggregation by month, I used the function month
from the package lubridate
. It does what one expects: it returns a numeric value indicating the month for a given date. Maybe you first need to install the package by install.packages("lubridate")
.
如果您不想使用 lubridate,您可以改为执行以下操作:
If you prefer not to use lubridate, you could do the following instead:
data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
data=data,FUN=sum)
在这里,我向包含月份的数据添加了一个新列,然后按该列聚合.
Here I added a new column to data that contains the month and then aggregated by that column.
这篇关于在 R 中按月汇总行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!