


I am using the auto.arima from the forecast package in R to determine the optimal K-terms for fourier series.


After I do that, I want to then calculate the seasonality and plug that one seasonality variable into a multiple regression model.


Using the dataset from the forecast package, I was able to extract the optimal amount of fourier terms:


##Public dataset from the forecast package

##Choose Optimal Amount of K-Terms
bestfit <- list(aicc=Inf)
for(i in 1:6)
  fit <- auto.arima(gas, xreg=fourier(gas, K=i), seasonal=FALSE)
  if(fit$aicc < bestfit$aicc)
    bestfit <- fit
  else break;

##Extract Fourier Terms
seasonality<-data.frame(fourier(gas, K=optimal_k_value))

##Convert Gas TS Data to Dataframe
gas_df <- data.frame(gas, year = trunc(time(gas)),
                 month = month.abb[cycle(gas)])

##Extract True Seasonality by Taking Sum of Rows
seasonality$total<- rowSums(seasonality)

##Combine Seasonality to Month and Year
final_df<-cbind(gas_df, seasonality$total)


Would the seasonality$total column be considered by "seasonality variable" for later modelling or do I need to add coefficients to it?


否,seasonality$total不是季节性变量.要看到这一点,请注意fourier(gas, K = optimal_k_value)的每一列都是从-1到1的季节性分量,因此它们只是sin(...)和cos(...),没有任何系数.显然,不同的季节成分必须具有不同的系数,所以您不应该将它们相加.

No, seasonality$total is not the seasonality variable. To see that, note that each column of fourier(gas, K = optimal_k_value) is just a seasonal component going from -1 to 1 so that they are just sin(...) and cos(...) without any coefficients. Clearly, different seasonal components must have different coefficients, so you shouldn't just sum them up.

侧面注释1 :由于i始终只是一个数字,因此使用max(i)毫无意义,只需optimal_k_value <- i就足够了.

Side comment 1: since i is always just a single number, there is no point in using max(i), just optimal_k_value <- i is enough.

旁注2 :我建议检查

plot(resid(auto.arima(gas, xreg = fourier(gas, K = optimal_k_value), seasonal = FALSE)))


For one, there may be seasonality of lower than yearly frequency (it seems like fourier doesn't allow to consider that), although perhaps you are going to model it separately as a trend. Also, it may be a good idea to split the data to something like before and after 1970.


09-05 11:07