Tutorial 3: Forecasting Methods and Routines

In this tutorial, we will introduce ‘for loop’, and illustrate its use by generating time series as well as by generating one-step-ahead forecasts. We will also perform forecast error diagnostics. To run the code, the data.table and ggplot2 packages need to be installed and loaded.

Let’s generate a random walk process, such that \(y_{t} = y_{t-1}+e_{t}\), where \(e_{t} \sim N(0,1)\), and where \(y_{0}=0\), for \(t=1,\ldots,120\).29

n <- 120

set.seed(1)
r <- rnorm(n)

y <- rep(NA,n)

y[1] <- r[1]

for(i in 2:n){
  y[i] <- y[i-1] + r[i]
}

Store \(y\) in a data.table along with some arbitrary dates to the data (e.g., suppose we deal with the monthly series beginning from January 2011).

dt <- data.table(y)

dt$date <- seq(as.Date("2011-01-01"),by="month",along.with=y)

Plot the realized time series using the ggplot function.

ggplot(dt,aes(x=date,y=y))+
  geom_line(size=1)+
  labs(x="Year",y="Random Walk")+
  theme_classic()

Generate a sequence of one-step-ahead forecasts from January 2017 onward by simply averaging the observed time series up to the period when the forecast is made.30

dt$f <- NA

R <- which(dt$date==as.Date("2017-01-01"))-1
P <- n-R
for(i in 1:P){
  dt$f[R+i] <- mean(dt$y[1:(R+i-1)])
}

Obtain the RMSFE measure the forecast.

dt$e <- dt$y-dt$f

rmsfe <- sqrt(mean(dt$e^2,na.rm=T))

rmsfe
## [1] 5.665095

Perform the forecast error diagnostics of the forecast.

Zero mean of the forecast errors: \(E(e_{t+1|t})=0\). We test this hypothesis by regressing the forecast error on the constant, and checking whether the coefficient is statistically significantly different from zero.

summary(lm(e~1,data=dt))$coefficients
##             Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 5.421519  0.2396999 22.61794 6.604031e-27

We reject the null, which suggests that we are consistently underestimating (in this case) the one-step-ahead forecasts.

No correlation of the forecast errors with the forecasts: \(Cov(e_{t+1|t},y_{t+1|t})=0\). We perform this test by regressing the forecast error on the forecast, and checking whether the slope coefficient is statistically significantly different from zero.

summary(lm(e~f,data=dt))$coefficients
##               Estimate Std. Error    t value     Pr(>|t|)
## (Intercept) -0.1437252  1.5861275 -0.0906139 0.9281928187
## f            0.9995059  0.2822414  3.5413157 0.0009247749

We reject the null, which suggests that there is some information in the data that we do not use well enough.31

No serial correlation in one-step-ahead forecast errors: \(Cov(e_{t+1|t},y_{t|t-1})=0\). We perform this test by regressing the forecast error on its lag, and checking whether the slope coefficient is statistically significantly different from zero.32

dt[,`:=`(e1=shift(e))]

summary(lm(e~e1,data=dt))$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 0.7057751 0.40681156  1.734894 8.960477e-02
## e1          0.8661231 0.07198491 12.032009 1.164311e-15

We reject the null, which again, suggests that the method that we have chosen for forecasting is far from ideal.


  1. The following code is deliberately done inefficiently to illustrate the use of the ‘for loop’. For reference, a much more efficient code, after setting the seed, would have been y <- cumsum(rnorm(n))↩︎

  2. This is equivalent to the expanding window scheme for generating forecasts.↩︎

  3. Because, as we know, the true data generating process is random walk, a better use of information would involve assigning all the weights to the most recent observation rather than spreading them evenly across all observations in the estimation window.↩︎

  4. Note: first we need to generate lagged forecast errors.↩︎