本文介绍了R ts 缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从 csv 文件中读取的数据框,该文件具有每日观察结果:

I have a data frame I read from a csv file that has daily observations:

Date        Value
2010-01-04  23.4
2010-01-05  12.7
2010-01-04  20.1
2010-01-07  18.2

问题:缺少数据.Forecast 包需要一个不包含任何缺失数据的普通 ts 对象,而我的数据集在大多数周末和其他随机点都有缺失数据.

PROBLEM: Missing data.Forecast package expects a plain ts object not containing any missing data, while my dataset has missing data on most weekends and other random points.

转换为 ts 应该不起作用

converting to ts should not work

ts(values, start = c(1997, 1), frequency = 1)

我能想到的唯一解决方案是将每日数据转换为每周数据,但 R 是一个新事物,可能存在其他更好的解决方案.

the only solution I can think of is to transform daily data to weekly data but R is a new thing and other better solutions could exist.

推荐答案

一种选择是扩展您的日期索引以包括缺失的观察结果,并使用 zoo 中的 na.approxcode> 通过插值填充缺失值.

One option is to expand your date index to include the missing observations, and use na.approx from zoo to fill in the missing values via interpolation.

allDates <- seq.Date(
  min(values$Date),
  max(values$Date),
  "day")
##
allValues <- merge(
  x=data.frame(Date=allDates),
  y=values,
  all.x=TRUE)
R> head(allValues,7)
        Date      Value
1 2010-01-05 -0.6041787
2 2010-01-06  0.2274668
3 2010-01-07 -1.2751761
4 2010-01-08 -0.8696818
5 2010-01-09         NA
6 2010-01-10         NA
7 2010-01-11 -0.3486378
##
zooValues <- zoo(allValues$Value,allValues$Date)
R> head(zooValues,7)
2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-09 2010-01-10 2010-01-11
-0.6041787  0.2274668 -1.2751761 -0.8696818         NA         NA -0.3486378
##
approxValues <- na.approx(zooValues)
R> head(approxValues,7)
2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-09 2010-01-10 2010-01-11
-0.6041787  0.2274668 -1.2751761 -0.8696818 -0.6960005 -0.5223192 -0.3486378

即使缺少值,zooValues 仍然是一个合法的 zoo 对象,例如plot(zooValues) 将起作用(在缺失值处存在不连续性),但如果您计划将某种模型拟合到数据中,您很可能最好使用 na.approx 替换缺失值.

Even with missing values, zooValues is still a legitimate zoo object, e.g. plot(zooValues) will work (with discontinuities at missing values), but if you plan on fitting some sort of model to the data, you will most likely be better off using na.approx to replace the missing values.

数据:

library(zoo)
library(lubridate)
##
t0 <- "2010-01-04"
Dates <- as.Date(ymd(t0))+1:120
weekDays <- Dates[!(weekdays(Dates) %in% c("Saturday","Sunday"))]
##
set.seed(123)
values <- data.frame(Date=weekDays,Value=rnorm(length(weekDays)))

这篇关于R ts 缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-16 09:39