Tutorial 4: Trends

In this tutorial, we will generate trending series that, moreover, also exhibit some seasonal pattern. We will apply an information criterion to select the most suitable model, and we will obtain and evaluate one-step-ahead forecasts using a rolling window procedure. To run the code, the data.table, ggplot2 and fastDummies packages need to be installed and loaded.

Let’s generate a quarterly time series that follows: \(y_{t} = 0.05t+0.0002t^2+0.7d_{1,t}+e_{t}\), where \(d_{1,t}=1\) if \(t\) is in quarter \(1\), and \(d_{1,t}=0\) otherwise; \(e_{t} \sim iid~\text{N}(0,1)\), for \(t=1,\ldots,80\).

n <- 80
s <- 4

set.seed(1)
e <- rnorm(n,0,1)

trend <- c(1:n)
dum <- rep(c(1,0,0,0),n/s)

y <- 0.05*trend-0.0002*trend^2+.7*dum+e

Store \(y\) and \(trend\) in a data.table, call it ‘dt’. Add some arbitrary dates to the data (e.g., suppose we deal with a series beginning from quarter 1 of 2001).

dt <- data.table(y,trend)

dt$date <- seq(as.Date("2001-01-01"),by="quarter",along.with=y)

Plot the realized time series using the ggplot function.

ggplot(dt,aes(x=date,y=y))+
  geom_line(size=1)+
  labs(x="Year",y="Simulated Series")+
  theme_classic()

Add quarterly dummy variables to the data, and calculate Akaike Information Criteria (AIC) for linear and quadratic trend models with and without quarterly seasonal dummy variables (so, four models in total). Observe that the linear trend model with the seasonal component offers the best in-sample fit, based on AIC.

dt$q <- quarter(dt$date)

dt <- dummy_cols(dt,select_columns="q")

reg1 <- lm(y~trend,data=dt)
reg2 <- lm(y~trend+I(trend^2),data=dt)
reg3 <- lm(y~trend+q_1+q_2+q_3,data=dt)
reg4 <- lm(y~trend+I(trend^2)+q_1+q_2+q_3,data=dt)

aic1 <- log(crossprod(reg1$residuals))+2*length(reg1$coefficients)/n
aic2 <- log(crossprod(reg2$residuals))+2*length(reg2$coefficients)/n
aic3 <- log(crossprod(reg3$residuals))+2*length(reg3$coefficients)/n
aic4 <- log(crossprod(reg4$residuals))+2*length(reg4$coefficients)/n

Generate a sequence of one-step-ahead forecasts from the linear trend model with seasonal dummy variables, using the rolling window scheme, where the first rolling window ranges from period 1 to period 60 (i.e., 75% of the data).

dt$f <- NA

R <- 60
P <- n-R
for(i in 1:P){
  reg3 <- lm(y~trend+q_1+q_2+q_3,data=dt[i:(R-1+i)])
  dt$f[R+i] <- reg3$coef[1]+reg3$coef[2]*dt$trend[R+i]+reg3$coef[3]*dt$q_1[R+i]+reg3$coef[4]*dt$q_2[R+i]+reg3$coef[5]*dt$q_3[R+i]
}

Plot the original series and the one-step-ahead forecasts on the same graph. Note, for convenience we will first ‘melt’ the data into the ‘long table’ format, then plot the data.

dl <- melt(dt[,.(date,y,f)],id.vars="date")

ggplot(dl,aes(x=date,y=value,color=variable,linetype=variable))+
  geom_line(size=1,na.rm=T)+
  scale_color_manual(values=c("darkgray","coral"))+
  scale_linetype_manual(values=c(1,5))+
  labs(x="Year",y="Trending Seasonal Series and Forecasts")+
  theme_classic()+
  theme(legend.title=element_blank(),legend.position=c(.15,.85))

Test the hypotheses of forecast unbiasedness, efficiency, and no serial correlation of forecast errors.

dt[,`:=`(f.e=y-f)]

summary(lm(e~1,data=dt))$coefficients

##              Estimate Std. Error  t value  Pr(>|t|)
## (Intercept) 0.1061465  0.1007144 1.053936 0.2951249

summary(lm(e~f,data=dt))$coefficients

##               Estimate Std. Error   t value  Pr(>|t|)
## (Intercept)  2.1966728  1.8297135  1.200556 0.2454854
## f           -0.6922787  0.5997203 -1.154336 0.2634587

dt[,`:=`(e1=shift(e))]
summary(lm(e~e1,data=dt))$coefficients

##                Estimate Std. Error    t value  Pr(>|t|)
## (Intercept)  0.11994115  0.1029770  1.1647377 0.2477210
## e1          -0.03933125  0.1138245 -0.3455429 0.7306285