问题描述
我是时间序列分析的新手.我有 60 个月的月度销售数据,从 2009 年 1 月到 20013 年 12 月,并试图通过 ARIMA 模型预测未来 6 个月的销售额.我读取数据并将其转换为时间序列对象,如下所示:
I am a newbie in time series analysis. I am having monthly sales data for 60 months, from January-2009 to December-20013, and trying to forecast sales for upcoming 6 months via ARIMA model. I read the data and convert it into time series object as follow :
data <- read.csv(file="monthlySalesData.csv", header=TRUE)
dataInTimeSeris <- ts(data, frequency = 12, start=c(2009,1), end=c(2013,12))
当我尝试绘制 acf() 图以确定我的自相关性下降到零之后的滞后时,我会在 X 轴上以小数形式获得滞后的比例.我没有足够的权限发布图像,但 X 轴上的滞后值是十进制的,最大滞后为 1.5 .plot=FALSE 的 acf 值也很奇怪(它没有显示计算自相关的滞后).我无法解释这一点,也无法找到自相关性下降到零之后的滞后数.
When I try to draw acf() plot to determine the lag after which my auto-correlation is dying down to zero, then I get scale of lag on X-axis in decimals. I am not having enough privilege to post image, but lag values on X-axis are in decimal with max lag as 1.5 . The acf values with plot=FALSE also come strange (It does not show lag for which it has calculated auto-correlation). I am not able to interpret this, and not able to find number of lags after which auto-correlation is dying down to zero.
acf(dataInTimeSeries, plot=FALSE)
Autocorrelations of series ‘dataInTimeSeries’, by lag
0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
1.000 0.642 0.588 0.490 0.401 0.320 0.311 0.269 0.178 0.198 0.229
0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167
0.271 0.358 0.240 0.210 0.092 0.135 0.098
问题是什么 - R 设置、数据导入或 ts() 函数有问题吗?如果这就是 acf 图显示的月度数据,如何解释它?
What is the issue - is there any problem with R settings, or data import or ts() function? And if this is how acf plots shows for monthly data, how to interpret it ?
提前致谢!!
推荐答案
您看到的小数是年,例如0.0833 = 1/12 年,0.1667 = 2/12 年.
The decimals you see are just years, e.g. 0.0833 = 1/12 year, 0.1667 = 2/12 year aso.
要获得滞后数月的 ACF 图,您可以尝试以下操作:
To get the ACF plot with lags as months you can try something like:
## Lacking reproducible example, I use simulated monthly data
tt <- ts(arima.sim(list(order=c(1,0,0), ar=0.4),60), start=2001, deltat=1/12)
## Calculate, but not plot, acf
acfpl <- acf(tt, plot=FALSE)
## Transform the lags from years to months
acfpl$lag <- acfpl$lag * 12
## Plot the acf
plot(acfpl, xlab="Lag (months)")
据我所知,您正在处理的问题是识别 ARMA 的订单.为此,您需要 ACF 和 PACF 图.当您说死到零"时,您不应该期望值在一些滞后后等于零.95% 置信区间内的值(蓝色虚线)在统计上不显着(另请查看 ?plot.acf
中的注释).
As I understand your problem you are dealing with is identifying the orders of ARMA. To do that you need both the ACF and PACF plots. When you say "dying to zero" you should not expect the values to be equal to zero after some lag. Values inside the 95% confidence interval (dashed blue lines) are not statistically significant (check also the notes in ?plot.acf
).
确定 ARIMA 模型的顺序可能很棘手,但您可以遵循一些规则.例如.过程 AR(p) 模型具有类似阻尼指数/正弦函数的 ACF 和具有 p 显着滞后的 PACF.例如.MA(q) 过程正好相反.
Identifying the order of an ARIMA model can be tricky, but there are some rules you can follow. E.g. processes AR(p) models have ACF like a damped exponential/sine function and PACF having p significant lags. E.g. MA(q) processes are the other way round.
就这两个简单情况而言,我使用 arima.sim
来模拟两个时间序列,ARIMA(1,0,0) 和 ARIMA(0,0,1).
Just to how it looks like for these two simple cases, I use arima.sim
to simulate two time series, ARIMA(1,0,0) and ARIMA(0,0,1).
set.seed(1234)
arima100 <- arima.sim(list(order=c(1,0,0), ar=0.9), n=500)
arima001 <- arima.sim(list(order=c(0,0,1), ma=0.9), n=500)
par(mfrow=c(2,2), bycol=TRUE)
acf(arima100); acf(arima001)
pacf(arima100); pacf(arima001)
这会产生以下图:
ARIMA(1,0,0):ACF 向零衰减,而 PACF 有一个明显的滞后.ARIMA(0,0,1):ACF 有一个明显的滞后(在滞后 0 之后,始终为 1),而 PACF 看起来像一个阻尼正弦函数.
ARIMA(1,0,0): ACF decays towards zero, and PACF has one significant lag.ARIMA(0,0,1): ACF has one significant lag (after lag-0 which is always 1), and PACF appears like a damped sine function.
现在,看看你的 ACF,我敢说两件事:
Now, just by looking at your ACF, I would dare say two things:
- 您的流程可能有一个 AR 术语(也必须检查 PACF)
- 您的数据可能存在季节性,因为在第 12 次滞后(即一年)出现峰值(您可以通过查看数据图来检查这一点)
您可以遵循的一些步骤:
Some steps you can follow:
- 如果趋势在您的数据中很明显,请采取差异
- 如果您有年度季节性,则采用滞后 12 的差异
- 绘制无差异和差异数据的 ACF 和 PACF
- 拟合模型
arima
并检查残差 - 如果您有多个候选模型,请比较它们的 AIC 或 BIC 值.
还阅读了一本好书(我使用了 Henrik Madsen 的时间序列分析)或讲义(这些看起来不错)可以帮助你很多.
Also reading a good book (I used Time Series Analysis by Henrik Madsen) or lecture notes (these look good) can help you a lot.
这篇关于R 中的时间序列分析:ts() 函数中的频率值与 acf 图中的滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!