问题描述
我有一个带有2个变量的数据框。第一列 X表示日期和时间,格式为dd / mm / yyyy hh:mm,第二列 Y中的值是电表读数,每5分钟后获取一次。现在,我想添加每个半小时的值。例如
I have a data frame with 2 variables. the first column "X" represents date and time with format dd/mm/yyyy hh:mm, the values in the second column "Y" are the electricity meter reading which are taken each after 5 minutes. Now I want to add the values of each half an hour. For instance
X Y
13/12/2014 12:00 1
13/12/2014 12:05 2
13/12/2014 12:10 1
13/12/2014 12:15 2
13/12/2014 12:20 2
13/12/2014 12:25 1
最后,我想将结果表示为:
At the end i want to present a result as:
13/12/2014 12:00 9
13/12/2014 12:30 12
等等...
推荐答案
这是另一种方法,实际上需要计算 X
(根据)。
Here's an alternative approach which actually takes X
in count (as per OP comment).
首先,我们将确保 X
是正确的 POSIXct
格式,以便我们可以正确地操作它(为了方便起见,我在这里使用 data.table
包)
First, we will make sure X
is of proper POSIXct
format so we could manipulate it correctly (I'm using the data.table
package here for convenience)
library(data.table)
setDT(df)[, X := as.POSIXct(X, format = "%d/%m/%Y %R")]
然后,我们将汇总 00 $
X
中的c $ c>或 30
,同时对 Y
求和并提取每个组的 X
的第一个值。我制作了一个更复杂的数据集,以说明更复杂的情况(见下文)
Then, we will aggregate per cumulative minutes instances of 00
or 30
within X
while summing Y
and extracting the first value of X
per each group. I've made a more complicated data set in order illustrate more complicated scenarios (see below)
df[order(X), .(X = X[1L], Y = sum(Y)), by = cumsum(format(X, "%M") %in% c("00", "30"))]
# cumsum X Y
# 1: 0 2014-12-13 12:10:00 6
# 2: 1 2014-12-13 12:30:00 6
# 3: 2 2014-12-13 13:00:00 3
数据
df <- read.table(text = "X Y
'13/12/2014 12:10' 1
'13/12/2014 12:15' 2
'13/12/2014 12:20' 2
'13/12/2014 12:25' 1
'13/12/2014 12:30' 1
'13/12/2014 12:35' 1
'13/12/2014 12:40' 1
'13/12/2014 12:45' 1
'13/12/2014 12:50' 1
'13/12/2014 12:55' 1
'13/12/2014 13:00' 1
'13/12/2014 13:05' 1
'13/12/2014 13:10' 1", header = TRUE)
一些解释
-
by
expr ession:
-
format(X,%M)
获取分钟数$ c> X (请参阅?strptime
) - 下一步是检查它们是否与<$ c匹配$ c> 00 或
30
(使用%in%
) -
cumsum
将这些匹配的值分成单独的组,我们通过将该表达式放入by
语句(请参见?data.table
)
- The
by
expression:format(X, "%M")
gets the minutes out ofX
(see?strptime
)- Next step is check if they match
00
or30
(using%in%
) cumsum
separates these matched values into separate groups which we aggregate by by putting this expression into theby
statement (see?data.table
)
-
(X = X [1L],Y = sum(Y) )
只是获得每个组的X
的第一个值和Y
的总和每个组。
(X = X[1L], Y = sum(Y))
is simply getting the first value ofX
per each group and the sum ofY
per each group.
- 我添加了
order(X)
,以确保数据集按日期正确排序(我将X
转换为正确的POSIXct
格式的主要原因)
- I've added
order(X)
in order to make sure the data set is properly ordered by date (one of the main reasons I've convertedX
to properPOSIXct
format)
Fo若要更好地了解
data.table
的工作原理,请参见一些教程For a better understanding on how
data.table
works, see some tutorials here这篇关于根据第一列的日期和时间添加第二列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
-