问题描述
我从包含日期,订单,金额等字段的CSV文件加载 data.table
。
I am loading a data.table
from CSV file that has date, orders, amount etc. fields.
输入文件偶尔没有所有日期的数据。例如,如下所示:
The input file occasionally does not have data for all dates. For example, as shown below:
> NADayWiseOrders
date orders amount guests
1: 2013-01-01 50 2272.55 149
2: 2013-01-02 3 64.04 4
3: 2013-01-04 1 18.81 0
4: 2013-01-05 2 77.62 0
5: 2013-01-07 2 35.82 2
在上面的03-Jan和06-Jan中没有任何条目。
In the above 03-Jan and 06-Jan do not have any entries.
想要使用默认值填充缺失的条目(例如,对于订单为零,金额等),或者将最后一个值填充(例如,03-Jan将重用02-Jan值和06-Jan将重用05-Jan值等。)
Would like to fill the missing entries with default values (say, zero for orders, amount etc.), or carry the last vaue forward (e.g, 03-Jan will reuse 02-Jan values and 06-Jan will reuse the 05-Jan values etc..)
什么是最好的/最佳的方式填补这种缺口日期数据是否有这样的默认值?
What is the best/optimal way to fill-in such gaps of missing dates data with such default values?
The answer here suggests using allow.cartesian = TRUE
, and expand.grid
for missing weekdays - it may work for weekdays (since they are just 7 weekdays) - but not sure if that would be the right way to go about dates as well, especially if we are dealing with multi-year data.
推荐答案
不确定是否是最快的,但如果没有 NA
s在数据中:
Not sure if it's the fastest, but it'll work if there are no NA
s in the data:
# just in case these aren't Dates.
NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
# all desired dates.
alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
# merge
dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
# now carry forward last observation (alternatively, set NA's to 0)
require(xts)
na.locf(dt)
这篇关于填写data.table的缺失日期的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!