class: center, middle, inverse, title-slide # Forecasting for Economics and Business ## Lecture 4: Trends and Seasonality ### David Ubilava ### University of Sydney --- # Time Series Components Some general features of the economic time series can be classified within three broad categories: - Trends - Seasonality - Cycles A time series can exhibit one or several of these features. --- # Trends Trend is a smooth, typically unidirectional pattern in the data that arises from the accumulation of information over time. The simplest (and perhaps most frequently applied) model to account for the trending time series is a *linear* trend model: `$$y_t = \alpha + \beta t$$` --- # Trends Other trend specifications are *polynomial* (e.g. quadratic, cubic, etc.), *exponential*, and *shifting* (or *switching*) trend models, respectively given by: `$$\begin{aligned} y_t &= \alpha + \beta_1 t + \beta_2 t^2 + \ldots + \beta_p t^p \\ y_t &= e^{\alpha + \beta t}\;~~\mbox{or}\;~~\ln{y_t} = \alpha + \beta t \\ y_t &= \alpha + \beta_1 t + \beta_2 (t-\tau)I(t>\tau),\;~~\tau\in\mathsf{T} \end{aligned}$$` --- # Trends Trend models are (relatively) easy to fit and forecast. Caution is needed with (higher order) polynomial trends, as they may fit well in-sample, but cause major problems out-of-sample. Exponential trends are suitable when a time series is characterized with a stable relative/percentage change over time. --- # Trends An exponential trend is equivalent to a linear trend fitted to natural logarithm of the series. For a time series `\(\{y_t: t=1,\ldots,T\}\)`, the natural logarithm is: `\(z_t = \ln{y_t}\)`. Some of the benefits of such a transformation are that: - they are easier to interpret (relative/percentage change). - they homogenizes the variance of the time series. - they may result in improved forecasting accuracy. The fitted trend can be reverse-transformed to fit the original series: `$$\hat{y}_{t} = e^{\hat{z}_{t}+\hat{\sigma}_{\varepsilon}^2/2}$$` The second term in the exponential function comes from the implied assumption that the error term in the log-transformed model is normally distributed. --- <!-- # Fitting and Forecasting Trends --> <!-- Consider a generic representation of a trend model with an additive error term: `$$y_t = g\big(t;\theta\big) + \varepsilon_t$$` We estimate parameters, `\(\theta\)`, by fitting the trend model to a time series using the least-squares regression: `$$\hat{\theta} = \operatorname*{argmin}_{\theta} \sum_{t=1}^{T}\big(y_t - g(t;\theta)\big)^2.$$` Fitted values are then given by: `$$\hat{y}_t = g\big(t;\hat{\theta}\big)$$` --> <!-- --- --> # Fitting and Forecasting Trends ## Linear trend model Suppose we assume a random vairable follows a linear trend model: `$$y_t = \alpha + \beta t + \varepsilon_t$$` Any future realization of the random variable is also assumed to follow a linear trend model: `$$y_{t+h} = \alpha + \beta (t+h) + \varepsilon_{t+h}.$$` --- # Fitting and Forecasting Trends An optimal forecast of `\(y_{t+h}\)` is given by: `$$y_{t+h|t} = E(y_{t+h}|\Omega_t) = E[\alpha + \beta (t+h) + \varepsilon_{t+h}] = \alpha + \beta (t+h).$$` The forecast error is given by: `$$e_{t+h|t} = y_{t+h} - y_{t+h|t} = \varepsilon_{t+h}$$` The forecast variance, thus, is given by: `$$\sigma_{t+h|t}^2 = E(e_{t+h|t}^2) = E(\varepsilon_{t+h}^2) = \sigma^2,\;~~\forall\;h$$` --- # Fitting and Forecasting Trends A few features of trend forecasts to note: - they tend to understate uncertainty (at long horizons). - short-term trend forecasts can perform poorly; long-term trend forecasts typically perform poorly. - sometimes it may be beneficial to forecast growth rates, and reconstruct level forecasts from growth. --- # Seasonality Seasonality is a self-repeating fluctuating pattern within a year that may arise from links of technologies, preferences, and institutions to the calendar. Seasonality is typically modeled as monthly or quarterly pattern, but can also be modeled as a higher frequency pattern (e.g. weekly). Some examples of time series with apparent seasonal patterns are: - Agricultural production. - Sales of energy products. - Airfare. --- # Seasonality One way to deal with the seasonality in data is to remove it prior to use of the series (work with a seasonally adjusted time series). Indeed, some economic time series are only/also available in a seasonally-adjusted form. Otherwise, and perhaps more interestingly, we can directly model seasonality in a regression setting by incorporating seasonal dummy variables. --- # Fitting and Forecasting Seasonality A seasonal model is given by: `$$y_t = \sum_{i=1}^{s}\gamma_i d_{it} + \varepsilon_t,$$` where `\(s\)` denotes the frequency of the data, and `\(d_{it}\)` takes the value of 1 repeatedly after every `\(s\)` periods, and such that `\(\sum_{i} d_{it} = 1\)`, `\(\forall t\)`. In matrix notation (easier to visualize): `$$\begin{bmatrix} y_{1}\\ y_{2}\\ \vdots \\ y_{T} \end{bmatrix} = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots &\vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix} \begin{bmatrix} \gamma_{1}\\ \gamma_{2}\\ \vdots \\ \gamma_{s} \end{bmatrix}+ \begin{bmatrix} \varepsilon_{1}\\ \varepsilon_{2}\\ \vdots \\ \varepsilon_{T} \end{bmatrix}$$` --- # Fitting and Forecasting Seasonality Alternatively the seasonal model can be rewritten as: `$$y_t = \alpha + \sum_{i=1}^{s-1}\delta_i d_{it} + \varepsilon_t,$$` in which case `\(\alpha\)` is an intercept of an omitted season, and `\(\delta_i\)` represents a deviation from it during the `\(i^{th}\)` season. Both approaches result in an identical fit and forecasts. --- # Fitting and Forecasting Seasonality Any future realization of a random variable that is assumed to follow a seasonal model is: `$$y_{t+h} = \alpha + \sum_{i=1}^{s-1}\delta_i d_{i,t+h} + \varepsilon_{t+h}.$$` --- # Fitting and Forecasting Seasonality The optimal forecast of `\(y_{t+h}\)` is given by: `$$y_{t+h|t} = E(y_{t+h}|\Omega_t) = \alpha + \sum_{i=1}^{s-1}\delta_i d_{i,t+h}$$` The forecast error is given by: `$$e_{t+h|t} = y_{t+h} - y_{t+h|t} = \varepsilon_{t+h}$$` The forecast variance is given by: `$$\sigma_{t+h|t}^2 = E(e_{t+h|t}^2) = E(\varepsilon_{t+h}^2) = \sigma^2,\;~~\forall\;h$$` --- # Model Selection Recall the most frequently used (and often abused) R-squared: `$$R^2 = 1-\frac{\sum_{t=1}^{T}\hat{e}_t^2}{\sum_{t=1}^{T}(y_t - \bar{y})^2}$$` Adjusted R-squared accounts for the loss in degrees of freedom: `$$\bar{R}^2 = 1-\frac{\sum_{t=1}^{T}\hat{e}_t^2}{\sum_{t=1}^{T}(y_t - \bar{y})^2}\left(\frac{T-1}{T-k}\right),$$` where `\(k\)` denotes the number of estimated parameters including the constant term. Such an adjustment might not be 'enough' so select a 'good' forecasting model, however. --- # Model Selection Information criteria penalize for the loss in degrees of freedom more 'harshly' than the adjusted R-squared: `$$\begin{aligned} AIC & = \ln{\left(\sum_{t=1}^{T}{\hat{e}_t^2}\right)} + 2\frac{k}{T} \\ SIC & = \ln{\left(\sum_{t=1}^{T}{\hat{e}_t^2}\right)} + \ln{T}\frac{k}{T} \end{aligned}$$` --- # Model Selection Things to remember about the information criteria: - Less is better. - Relative (not absolute) values of the criteria matter. - SIC selects a more parsimonious model than AIC. - The measures are used to compare fit across different models, given that the same data are used in those models. - Asymptotically, minimizing AIC is equivalent to minimizing the one-step-ahead out-of-sample mean square forecast error. - Even so, models selected using SIC may perform better. --- # Readings Hyndman & Athanasopoulos, [Sections from Chapter 2](https://otexts.com/fpp3/toolbox.html) Gonzalez-Rivera, Chapter 10