class: center, middle, inverse, title-slide # Forecasting for Economics and Business ## Lecture 5: Autoregression ### David Ubilava ### University of Sydney --- # Cycles A cycle, in a time series, is a pattern of periodic fluctuations. Unlike seasonality, cycles are not contained within a calendar year. Cycles can be *deterministic* or *stochastic*. Economic time series are better characterized by stochastic cycles. A cycle is stochastic when it is generated by random variables. In general terms, the process is given by: `$$Y_t = f(Y_{t-1},Y_{t-2},\ldots;\mathbf{\theta})+\varepsilon_t.\;~~t=1,\ldots,T$$` --- # Autoregressive Models An autoregressive process (or, simply, an autoregression) is a regression in which the dependent variable and the regressors belong to the same stochastic process. An autoregressive model of order `\(p\)`, referred as `\(AR(p)\)`, has the following functional form: `$$y_t = \alpha + \beta_1 y_{t-1}+\beta_2 y_{t-2}+ \cdots + \beta_p y_{t-p}+\varepsilon_t$$` The sum of the autoregressive parameters, `\(\beta_1,\ldots,\beta_p\)`, depicts the persistence of the series. The larger is the persistence (i.e., closer it is to one), the longer it takes for the effect of a shock to dissolve. --- # Autoregressive Models The autocorrelation, `\(\rho\)`, and partial autocorrelation, `\(\pi\)`, functions of the covariance-stationary `\(AR(p)\)` process have the following distinctive features: - `\(\rho_1 = \pi_1\)`, and `\(\pi_p = \beta_p\)`. - The autocorrelation function decreases toward zero, but in different fashion depending on the values of `\(\beta_1,\ldots,\beta_p\)`. Nonetheless, the decay is faster when the persistence measure is smaller. - The partial autocorrelation function is characterized by the first `\(p\)` spikes `\(\pi_1 \neq 0,\ldots,\pi_p \neq 0\)`, and the remaining `\(\pi_k = 0\)`, `\(\forall k > p\)`. --- # AR(1) Modelling Consider the first-order autoregression: `$$y_t = \alpha + \beta_1 y_{t-1} + \varepsilon_t,$$` where `\(\alpha\)` is a constant term; `\(\beta_1\)` is the *persistence* parameter; and `\(\varepsilon_t\)` is a white noise process. A necessary and sufficient condition for an `\(AR(1)\)` process to be covariance stationary is that `\(|\beta_1| < 1\)`. --- # AR(1) Modelling Substitute recursively lagged dependent variables: $$ `\begin{align} y_t &= \alpha + \beta_1 y_{t-1} + \varepsilon_t \notag \\ y_t &= \alpha + \beta_1 (\alpha + \beta_1 y_{t-2} + \varepsilon_{t-1}) + \varepsilon_t \notag \\ &= \alpha(1+\beta_1) + \beta_1^2 (\alpha + \beta_1 y_{t-3} + \varepsilon_{t-2}) + \beta_1\varepsilon_{t-1} + \varepsilon_t \notag \\ &\vdots \notag \\ &= \alpha\sum_{i=0}^{k-1}\beta_1^i + \beta_1^k y_{t-k} + \sum_{i=0}^{k-1}\beta_1^i\varepsilon_{t-i} \end{align}` $$ The end-result is a general linear process with geometrically declining coefficients. The `\(|\beta_1| < 1\)` is required for convergence. --- # AR(1) Modelling Assuming `\(|\beta_1| < 1\)` and `\(k \to \infty\)`: `$$y_t = \frac{\alpha}{1-\beta_1} + \sum_{i=0}^{\infty}\beta_1^i\varepsilon_{t-i}$$` *Unconditional mean*: `$$\mu = E\left(y_t\right) = E\left(\frac{\alpha}{1-\beta_1} + \sum_{i=0}^{\infty}\beta_1^i\varepsilon_{t-i}\right) = \frac{\alpha}{1-\beta_1}$$` *Unconditional variance*: `$$\gamma_0 = Var\left(y_t\right) = Var\left(\frac{\alpha}{1-\beta_1} + \sum_{i=0}^{\infty}\beta_1^i\varepsilon_{t-i}\right) = \frac{\sigma_{\varepsilon}^2}{1-\beta_1^2}$$` --- # AR(1) Modelling *Autocovariance*: `$$\gamma_k = Cov(y_t,y_{t-k}) = E[(y_t - \mu)(y_{t-k} - \mu)] = E(y_t y_{t-k}) - \mu^2$$` Multiply both sides of `\(AR(1)\)` with `\(y_{t-k}\)` and take the expectation: `$$E(y_t y_{t-k}) = \alpha \mu + \beta_1 E(y_{t-1}y_{t-k})$$` Some rearrangement and algebraic manipulation will yield: `$$\gamma_k = \beta_1\gamma_{k-1}$$` --- # AR(1) Modelling *Autocorrelation* (recall, `\(\rho_k = \gamma_k/\gamma_0\)`): `$$\rho_{k} = \beta_1\rho_{k-1}$$` It then follows that: $$ `\begin{align} \rho_1 &= \beta_1\rho_0 = \beta_1 \notag \\ \rho_2 &= \beta_1\rho_1 = \beta_1^2 \notag \\ &\vdots \notag \\ \rho_k &= \beta_1\rho_{k-1} = \beta_1^k \end{align}` $$ If `\(|\beta_1| < 1\)`, the autocorrelation of AR(1) is a geometric decay. The smaller is `\(|\beta_1|\)` the more rapid is the decay. --- # AR(1) Modelling By imposing certain restrictions, the AR(1) will reduce to other already known models: - If `\(\beta_1 = 0\)`, `\(y_t\)` is equivalent to a white noise. - If `\(\beta_1 = 1\)` and `\(\alpha = 0\)`, `\(y_t\)` is a random walk. - If `\(\beta_1 = 1\)` and `\(\alpha \neq 0\)`, `\(y_t\)` is a random walk with drift. In general, the smaller persistence parameter results in a quicker adjustment to the *unconditional mean* of the process, and vice versa. --- # AR(1) Modelling The autocorrelation and partial autocorrelation functions of the AR(1) process have three distinctive features: - `\(\rho_1 = \pi_1 = \beta_1\)`. That is, the persistence parameter is also the autocorrelation and the partial autocorrelation coefficient. - The autocorrelation function decreases exponentially toward zero, and the decay is faster when the persistence parameter is smaller. - The partial autocorrelation function is characterized by only one spike `\(\pi_1 \neq 0\)`, and the remaining `\(\pi_k = 0\)`, `\(\forall k > 1\)`. --- # AR(1) Forecasting The optimal one-step-ahead forecast: `$$y_{t+1|t} = E(y_{t+1}|\Omega_t) = E(\alpha + \beta_1 y_{t} + \varepsilon_{t+1}) = \alpha + \beta_1 y_{t}$$` The one-step-ahead forecast error: `$$e_{t+1|t} = y_{t+1} - y_{t+1|t} = \alpha + \beta_1 y_t + \varepsilon_{t+1} - (\alpha + \beta_1 y_t) = \varepsilon_{t+1}$$` The one-step-ahead forecast variance: `$$\sigma_{t+1|t}^2 = Var(y_{t+1}|\Omega_t) = E(e_{t+1|t}^2) = E(\varepsilon_{t+1}^2) = \sigma_{\varepsilon}^2$$` The one-step-ahead (95%) interval forecast: `$$y_{t+1|t} \pm z_{.025}\sigma_{t+1|t} = y_{t+1|t} \pm 1.96\sigma_{\varepsilon}$$` --- # AR(1) Forecasting The optimal two-step-ahead forecast: `$$y_{t+2|t} = E(y_{t+2}|\Omega_t) = E(\alpha + \beta_1 y_{t+1} + \varepsilon_{t+2}) = \alpha(1+\beta_1) + \beta_1^2 y_t$$` The two-step-ahead forecast error: `$$\begin{align} e_{t+2|t} &= y_{t+2} - y_{t+2|t} \\ &= \alpha(1+\beta_1) + \beta_1^2 y_t + \beta_1\varepsilon_{t+1} + \varepsilon_{t+2} - [\alpha(1+\beta_1) + \beta_1^2 y_t] \\ &= \beta_1\varepsilon_{t+1} + \varepsilon_{t+2} \end{align}$$` --- # AR(1) Forecasting The two-step-ahead forecast variance: `$$\begin{align} \sigma_{t+2|t}^2 &= Var(y_{t+2}|\Omega_t) \\ &= E(e_{t+2|t}^2) = E(\beta_1\varepsilon_{t+1} + \varepsilon_{t+2})^2 = \sigma_{\varepsilon}^2(1+\beta_1^2) \end{align}$$` The two-step-ahead (95%) interval forecast: `$$y_{t+2|t} \pm z_{.025}\sigma_{t+2|t} = y_{t+2|t} \pm 1.96\sigma_{\varepsilon}\sqrt{1+\beta_1^2}$$` --- # AR(1) Forecasting The optimal h-step-ahead forecast: `$$y_{t+h|t} = E(y_{t+h}|\Omega_t) = E(\alpha + \beta_1 y_{t+h-1} + \varepsilon_{t+1}) = \alpha\textstyle\sum_{j=0}^{h-1}\beta_1^j + \beta_1^h y_t$$` The h-step-ahead forecast error: `$$e_{t+h|t} = y_{t+h} - y_{t+h|t} = \textstyle\sum_{j=0}^{h-1}\beta_1^j\varepsilon_{t+h-j}$$` The h-step-ahead forecast variance: `$$\sigma_{t+h|t}^2 = Var(y_{t+h}|\Omega_t) = E(e_{t+h|t}^2) = \sigma_{\varepsilon}^2\textstyle\sum_{j=0}^{h-1}\beta_1^{2j}$$` The h-step-ahead (95%) interval forecast: `$$y_{t+h|t} \pm z_{.025}\sigma_{t+h|t} = y_{t+1|t} \pm 1.96\sigma_{\varepsilon}\sqrt{\textstyle\sum_{j=0}^{h-1}\beta_1^{2j}}$$` --- # AR(1) Forecasting In a covariance-stationary process, i.e. when `\(|\beta_1| < 1\)`, as `\(h \to \infty\)`: The optimal point forecast: `$$y_{t+h|t} = \frac{\alpha}{1-\beta_1}$$` The forecast variance: `$$\sigma_{t+h|t}^2 = \frac{\sigma_{\varepsilon}^2}{1-\beta_1^2}$$` The (95%) interval forecast: `$$y_{t+h|t} \pm z_{.025}\sigma_{t+h|t} = \frac{\alpha}{1-\beta_1} \pm 1.96\frac{\sigma_{\varepsilon}}{\sqrt{1-\beta_1^2}}$$` --- # AR(2) Modeling and Forecasting Consider the second-order autoregression: `$$y_t = \alpha + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \varepsilon_t$$` where `\(\alpha\)` is a constant term; `\(\beta_1+\beta_2\)` is the persistence measure; and `\(\varepsilon_t\)` is a white noise process. In what follows, the necessary (1 and 2) and sufficient (3 and 4) conditions for an `\(AR(2)\)` process to be covariance stationary are: - `\(|\beta_2| < 1\)` - `\(|\beta_1| < 2\)` - `\(\beta_1 + \beta_2 < 1\)` - `\(\beta_2 - \beta_1 < 1\)` --- # AR(2) Modeling and Forecasting The autocorrelation functions of the AR(2) process have the following distinctive features: - `\(\rho_1 = \pi_1\)` (which is true for any `\(AR(p)\)` process), and `\(\pi_2 = \beta_2\)`. - The autocorrelation function decreases toward zero. The path, however, varies depending on the values of `\(\beta_1\)` and `\(\beta_2\)`. Nonetheless, the decay is faster when the persistence measure is smaller. - The partial autocorrelation function is characterized by only two spikes `\(\pi_1 \neq 0\)` and `\(\pi_2 \neq 0\)`, and the remaining `\(\pi_k = 0\)`, `\(\forall k > 2\)`. --- # AR(2) Modeling and Forecasting The optimal one-step-ahead forecast: `$$\begin{align} y_{t+1|t} &= E(y_{t+1}|\Omega_t) \\ &= E(\alpha + \beta_1 y_{t} + \beta_2 y_{t-1} + \varepsilon_{t+1}) = \alpha + \beta_1 y_{t} + \beta_2 y_{t-1} \end{align}$$` The one-step-ahead forecast error: `$$\begin{align} e_{t+1|t} &= y_{t+1} - y_{t+1|t} \\ &= \alpha + \beta_1 y_t + \beta_2 y_{t-1} + \varepsilon_{t+1} - (\alpha + \beta_1 y_t + \beta_2 y_{t-1}) = \varepsilon_{t+1} \end{align}$$` --- # AR(2) Modeling and Forecasting The one-step-ahead forecast variance: `$$\sigma_{t+1|t}^2 = Var(y_{t+1}|\Omega_t) = E(e_{t+1|t}^2) = E(\varepsilon_{t+1}^2) = \sigma_{\varepsilon}^2$$` The one-step-ahead (95%) interval forecast: `$$y_{t+1|t} \pm z_{.025}\sigma_{t+1|t} = y_{t+1|t} \pm 1.96\sigma_{\varepsilon}$$` --- # AR(2) Modeling and Forecasting The optimal two-step-ahead forecast: `$$\begin{align} y_{t+2|t} = E(y_{t+2}|\Omega_t) &= E(\alpha + \beta_1 y_{t+1} + \beta_2 y_{t} + \varepsilon_{t+2}) \\ &= \alpha(1+\beta_1) + (\beta_1^2+\beta_2) y_{t} + \beta_1\beta_2 y_{t-1} \end{align}$$` The two-step-ahead forecast error: `$$\begin{align} e_{t+2|t} = y_{t+2} - y_{t+2|t} =& \alpha + \beta_1 y_{t+1} + \beta_2 y_{t} + \varepsilon_{t+2} \\ &- (\alpha + \beta_1 y_{t+1|t} + \beta_2 y_{t}) = \beta_1\varepsilon_{t+1} + \varepsilon_{t+2} \end{align}$$` --- # AR(2) Modeling and Forecasting The two-step-ahead forecast variance: `$$\sigma_{t+2|t}^2 = Var(y_{t+2}|\Omega_t) = E(e_{t+2|t}^2) = E(\beta_1\varepsilon_{t+1} + \varepsilon_{t+2})^2 = \sigma_{\varepsilon}^2(1+\beta_1^2)$$` The two-step-ahead (95%) interval forecast: `$$y_{t+2|t} \pm z_{.025}\sigma_{t+2|t} = y_{t+2|t} \pm 1.96\sigma_{\varepsilon}\sqrt{1+\beta_1^2}$$` --- # AR(2) Modeling and Forecasting The optimal h-step-ahead forecast (iterated method): `$$\begin{align} y_{t+1|t} &= \alpha + \beta_1 y_t + \beta_2 y_{t-1} \\ y_{t+2|t} &= \alpha + \beta_1 y_{t+1|t} + \beta_2 y_{t} \\ y_{t+3|t} &= \alpha + \beta_1 y_{t+2|t} + \beta_2 y_{t+1|t} \\ &\vdots \\ y_{t+h|t} &= \alpha + \beta_1 y_{t+h-1|t} + \beta_2 y_{t+h-2|t} \end{align}$$` The h-step-ahead forecast error: `$$e_{t+h|t} = y_{t+h} - y_{t+h|t} = \varepsilon_{t+h}+\beta_1 e_{t+h-1|t}+\beta_2 e_{t+h-2|t}$$` --- # AR(2) Modeling and Forecasting The h-step-ahead forecast variance: `$$\begin{align} \sigma_{t+h|t}^2 &= Var(y_{t+h}|\Omega_t) = E(e_{t+h|t}^2) \\ &= \sigma_{\varepsilon}^2+\beta_1^2 Var(e_{t+h-1|t})+\beta_2^2 Var(e_{t+h-2|t}) \\ &+2\beta_1\beta_2Cov(e_{t+h-1|t},e_{t+h-2|t}) \end{align}$$` Note that, in general, formulas for `\(\sigma_{t+1|t}^2,\sigma_{t+2|t}^2,\ldots,\sigma_{t+h|t}^2\)` are the same for any `\(AR(p)\)`, `\(p \geq h-1\)`. The h-step-ahead (95%) interval forecast: `$$y_{t+h|t} \pm z_{.025}\sigma_{t+h|t} = y_{t+h|t} \pm 1.96\sigma_{t+h|t}$$` --- # AR(p) Forecasting The optimal h-step-ahead forecast: `$$y_{t+h|t} = E(y_{t+h}|\Omega_t) = \alpha + \beta_1 y_{t+h-1|t} + \beta_2 y_{t+h-2|t} + \cdots + \beta_p y_{t+h-p|t}$$` The h-step-ahead forecast error: `$$e_{t+h|t} = \varepsilon_{t+h} + \beta_1 e_{t+h-1|t} + \beta_2 e_{t+h-2|t} + \cdots + \beta_p e_{t+h-p|t}$$` --- # AR(p) Forecasting The h-step-ahead forecast variance: `$$\begin{align} \sigma_{t+h|t}^2 & = Var(y_{t+h}|\Omega_t) = E(e_{t+h|t}^2) \\ &= \sigma_{\varepsilon}^2 + \sum_{i=1}^{p}\beta_i^2 Var(e_{t+h-i|t}) + 2\sum_{i \neq j}\beta_i\beta_j Cov(e_{t+h-i|t},e_{t+h-j|t}) \end{align}$$` The h-step-ahead (95%) interval forecast: `$$y_{t+h|t} \pm z_{.025}\sigma_{t+h|t} = y_{t+h|t} \pm 1.96\sigma_{t+h|t}$$` --- # Readings Hyndman & Athanasopoulos, [Sections from Chapter 9](https://otexts.com/fpp3/toolbox.html) Gonzalez-Rivera, Chapter 7