Forecasting for Economics and Business

class: center, middle, inverse, title-slide

.title[
# Forecasting for Economics and Business
]
.subtitle[
## Lecture 10: Forecasting Inter-related Series
]
.author[
### David Ubilava
]
.date[
### University of Sydney
]

---

# Economic variables are inter-related

.pull-left[
![](Art/multivariate.png)
]

.pull-right[
Economic variables are often (and usually) inter-related; for example,

- when incomes increase, people consume more; 
- when interest rates increase, people invest less.

We can model these relationships by regressing one variable on the other(s). 
]

---

# Autoregressive distributed lag (ARDL)

.right-column[
Because of possible delays in adjustment to a new equilibrium, we typically need to include lagged values in the regression.

This leads to an autoregressive distributed lag (ARDL) model, as follows: `$$y_t=\alpha+\beta_1y_{t-1}+\ldots+\beta_py_{t-p}+\delta_0x_t+\ldots+\delta_qx_{t-q}+\varepsilon_t$$`

Note that an autoregressive model and a distributed lag model are special cases of the ARDL.

As usual, we can use an information criterion to select `$p$` and `$q$`.
]

---

# A relationship between oil prices and inflation

.right-column[
![](figures/lecture10/series.png)
]

---

# ARDL lag length and estimated parameters

.pull-left[
Obtain AIC and SIC for lags 1 through 4
<table class="myTable">
 <thead>
  <tr>
   <th style="text-align:center;"> AIC </th>
   <th style="text-align:center;"> SIC </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1.997 </td>
   <td style="text-align:center;"> 2.058 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 1.932 </td>
   <td style="text-align:center;"> 2.023 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 1.946 </td>
   <td style="text-align:center;"> 2.068 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 1.913 </td>
   <td style="text-align:center;"> 2.065 </td>
  </tr>
</tbody>
</table>
]

.pull-right[
AIC suggests four lags while SIC indicates two lags. Let's go with SIC.
<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> `$\alpha$` </th>
   <th style="text-align:right;"> `$\beta_{1}$` </th>
   <th style="text-align:right;"> `$\beta_{2}$` </th>
   <th style="text-align:right;"> `$\delta_{0}$` </th>
   <th style="text-align:right;"> `$\delta_{1}$` </th>
   <th style="text-align:right;"> `$\delta_{2}$` </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> estimate </td>
   <td style="text-align:right;"> 0.251 </td>
   <td style="text-align:right;"> 0.986 </td>
   <td style="text-align:right;"> -0.098 </td>
   <td style="text-align:right;"> 0.016 </td>
   <td style="text-align:right;"> -0.010 </td>
   <td style="text-align:right;"> -0.007 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> s.e. </td>
   <td style="text-align:right;"> 0.048 </td>
   <td style="text-align:right;"> 0.065 </td>
   <td style="text-align:right;"> 0.063 </td>
   <td style="text-align:right;"> 0.002 </td>
   <td style="text-align:right;"> 0.003 </td>
   <td style="text-align:right;"> 0.002 </td>
  </tr>
</tbody>
</table>
]

---

# Forecasting with ARDL

.right-column[
Point forecasts:

`$$\begin{align}
y_{t+1|t} &= \alpha + \beta_1y_{t} + \beta_2y_{t-1}+\delta_0\hat{x}_{t+1}+\delta_1x_{t}+\delta_2x_{t-1}\\
y_{t+2|t} &= \alpha + \beta_1y_{t+1|t} + \beta_2y_{t}+\delta_0\hat{x}_{t+2}+\delta_1\hat{x}_{t+1}+\delta_2x_{t}\\
y_{t+h|t} &= \alpha + \beta_1y_{t+h-1|t} + \beta_2y_{t+h-2|t}\\
&+\delta_0\hat{x}_{t+h}+\delta_1\hat{x}_{t+h-1}+\delta_2\hat{x}_{t+h-2}
\end{align}$$`

Note, and this is one of the caveats of using ARDL in forecasting, we need to have a knowledge of future realized values of `$x$`. Often, that involves also (separately) forecasting those values, or setting them to some values (e.g., scenario forecasting).
]

---

# Forecasting with ARDL

.right-column[
If we assume (somewhat unrealistically) that we know the future realized values of `$x$`, then the forecast errors and forecast error variances from the ARDL model will be the same as those from an AR model.

This is, in a way, akin to working with an AR model that includes a trend or a seasonal component.

Of course, obtaining direct multistep forecasts is always an option.
]

---

# If we knew future realized values of oil prices

.right-column[
![](figures/lecture10/inflation_known.png)
]

---

# If we set future realized values to the current oil price

.right-column[
![](figures/lecture10/inflation_fixed.png)
]

---

# Vector autoregression

.right-column[
More generally, dynamic linkages between two (or more) economic variables can be modeled as a *system of equations*, better known as a vector autoregression (VAR).
]

---

# Vector autoregression

.right-column[
To begin, consider a bivariate first-order VAR.

Let `$\{X_{1t}\}$` and `$\{X_{2t}\}$` be the stationary stochastic processes. A bivariate VAR(1), is then given by:
`$$\begin{aligned}
x_{1t} &= \alpha_1 + \pi_{111}x_{1t-1} + \pi_{121}x_{2t-1} + \varepsilon_{1t} \\
x_{2t} &= \alpha_2 + \pi_{211}x_{1t-1} + \pi_{221}x_{2t-1} + \varepsilon_{2t}
\end{aligned}$$`

where `$\varepsilon_{1t} \sim iid(0,\sigma_1^2)$` and `$\varepsilon_{2t} \sim iid(0,\sigma_2^2)$`, and the two can be correlated, i.e., `$Cov(\varepsilon_{1t},\varepsilon_{2t}) \neq 0$`.
]

---

# Vector autoregression

.right-column[
An `$n$`-dimensional VAR of order `$p$`, VAR(p), presented in matrix notation: `$$\mathbf{x}_t = \mathbf{\alpha} + \Pi_1 \mathbf{x}_{t-1} + \ldots + \Pi_p \mathbf{x}_{t-p} + \mathbf{\varepsilon}_t,$$` where `$\mathbf{x}_t = (x_{1t},\ldots,x_{nt})'$` is a vector of `$n$` (potentially) related variables; `$\mathbf{\varepsilon}_t = (\varepsilon_{1t},\ldots,\varepsilon_{nt})'$` is a vector of error terms, such that `$\mathbb{E}\left(\mathbf{\varepsilon}_t\right) = \mathbf{0}$`, `$\mathbb{E}\left(\mathbf{\varepsilon}_t^{}\mathbf{\varepsilon}_t^{\prime}\right) = \Sigma_{\mathbf{\varepsilon}}$`, and `$\mathbb{E}\left(\mathbf{\varepsilon}_{t}^{}\mathbf{\varepsilon}_{s \neq t}^{\prime}\right) = 0$`.
]
		
---

# A parameter matrix of a vector autoregression

.right-column[
`$\Pi_1,\ldots,\Pi_p$` are `$n$`-dimensional parameter matrices such that: 
`$$\Pi_j = 
	    \left[ 
		\begin{array}{cccc} 
		\pi_{11j} & \pi_{12j} & \cdots &  \pi_{1nj} \\ 
		\pi_{21j} & \pi_{22j} & \cdots &  \pi_{2nj} \\  
		\vdots & \vdots & \ddots &  \vdots \\  
		\pi_{n1j} & \pi_{n2j} & \cdots &  \pi_{nnj} 
		\end{array} 
		\right],\;~~j=1,\ldots,p$$`
]
		
---

# Features of vector autoregression

.right-column[
General features of a (reduced-form) vector autoregression are that:
- only the lagged (i.e., no contemporaneous) values of the dependent variables are on the right-hand-side of the equations.
  * Although, trends and seasonal variables can also be included.
- each equation has the same set of right-hand-side variables.
  * However, it is possible to impose different lag structure across the equations, especially when `$p$` is relatively large. This is because the number of parameters increases very quickly with the number of lags or the number of variables in the system.
- the autregressive order, `$p$`, is the largest lag across all equations.
]

---

# Modeling vector autoregression

.right-column[
The autoregressive order, `$p$`, can be determined using system-wide information criteria:

`$$\begin{aligned}
& AIC = \ln\left|\Sigma_{\mathbf{\varepsilon}}\right| + \frac{2}{T}(pn^2+n) \\
& SIC = \ln\left|\Sigma_{\mathbf{\varepsilon}}\right| + \frac{\ln{T}}{T}(pn^2+n)
\end{aligned}$$`

where `$\left|\Sigma_{\mathbf{\varepsilon}}\right|$` is the determinant of the residual covariance matrix; `$n$` is the number of equations, and `$T$` is the total number of observations.
]

---

# Estimating vector autoregression

.right-column[
When each equation of VAR has the same regressors, the OLS can be applied to each equation individually to estimate the regression parameters - i.e., the estimation can be carried out on the equation-by-equation basis.

Indeed, taken separately, each equation is just an ARDL (albeit without any contemporaneous regressors in the right-hand side of the equation).

When processes are covariance-stationarity, conventional t-tests and F-tests are applicable for hypotheses testing.
]

---

# Testing in-sample Granger causality

.right-column[
Consider a bivariate VAR(p): 
`$$\begin{aligned}
x_{1t} &= \alpha_1 + \pi_{111} x_{1t-1} + \cdots + \pi_{11p} x_{1t-p} \\
&+ \pi_{121} x_{2t-1} + \cdots + \pi_{12p} x_{2t-p} +\varepsilon_{1t}  \\
x_{2t} &= \alpha_1 + \pi_{211} x_{1t-1} + \cdots + \pi_{21p} x_{1t-p} \\
&+ \pi_{221} x_{2t-1} + \cdots + \pi_{22p} x_{2t-p} +\varepsilon_{2t} 
\end{aligned}$$`

- `$\{X_2\}$` does not Granger cause `$\{X_1\}$` if `$\pi_{121}=\cdots=\pi_{12p}=0$`
- `$\{X_1\}$` does not Granger cause `$\{X_2\}$` if `$\pi_{211}=\cdots=\pi_{21p}=0$`
]

---

# Testing in-sample Granger causality

.right-column[
In our example:

`$\text{Crude Oil}\xrightarrow{gc}\text{Inflation}$`: F statistic is 12.12

`$\text{Inflation}\xrightarrow{gc}\text{Crude Oil}$`: F statistic is 1.98

So, we have a unidirectional Granger causality (from crude oil prices to 5-year expected inflation).
]

---

# Forecasting with VAR models: one-step ahead

.right-column[
To keep things simple, we illustrate using bivariate VAR(1).

Point forecasts:
`$$\begin{aligned}
x_{1t+1|t} &= \alpha_1 + \pi_{111} x_{1t} + \pi_{121} x_{2t} \\
x_{2t+1|t} &= \alpha_2 + \pi_{211} x_{1t} + \pi_{221} x_{2t}
\end{aligned}$$`

Forecast errors: 
`$$\begin{aligned}
e_{1t+1} &= x_{1t+1} - x_{1t+1|t} = \varepsilon_{1t+1} \\
e_{2t+1} &= x_{2t+1} - x_{2t+1|t} = \varepsilon_{2t+1}
\end{aligned}$$`
]

---

# Forecasting with VAR models: one-step ahead

.right-column[
Forecast variances:
`$$\begin{aligned}
\sigma_{1t+1}^2 &= \mathbb{E}(e_{1t+1}|\Omega_t)^2 = \mathbb{E}(\varepsilon_{1t+1}^2) = \sigma_{1}^2 \\
\sigma_{2t+1}^2 &= \mathbb{E}(e_{2t+1}|\Omega_t)^2 = \mathbb{E}(\varepsilon_{2t+1}^2) = \sigma_{2}^2
\end{aligned}$$`

Assuming normality of forecast errors, interval forecasts are obtained the usual way.
]

---

# Forecasting with VAR models: multi-step ahead

.right-column[
Point forecasts:
`$$\begin{aligned}
x_{1t+h|t} &= \alpha_1 + \pi_{111} x_{1t+h-1|t} + \pi_{121} x_{2t+h-1|t} \\
x_{2t+h|t} &= \alpha_2 + \pi_{211} x_{1t+h-1|t} + \pi_{221} x_{2t+h-1|t}
\end{aligned}$$`

Forecast errors: 
`$$\begin{aligned}
e_{1t+h} &= \pi_{111} e_{1t+h-1} + \pi_{121} e_{2t+h-1} + \varepsilon_{1t+h} \\
e_{2t+h} &= \pi_{211} e_{1t+h-1} + \pi_{221} e_{2t+h-1} + \varepsilon_{2t+h}
\end{aligned}$$`

Forecast variances are the functions of error variances and covariances, and the model parameters.
]

---

# Forecasting oil prices and inflation

.right-column[
![](figures/lecture10/forecasts.png)
]

---

# Direct multi-step ahead forecasts

.right-column[
For a given horizon, `$h$`, the estimated model:
`$$\begin{aligned}
x_{1t} &= \phi_1 + \psi_{111} x_{1t-h} + \psi_{121} x_{2t-h} + \upsilon_{1ht} \\
x_{2t} &= \phi_2 + \psi_{211} x_{1t-h} + \psi_{221} x_{2t-h} + \upsilon_{2ht}
\end{aligned}$$`

where `$\upsilon_{1ht} \sim iid(0,\sigma_{1h}^2)$` and `$\upsilon_{2ht} \sim iid(0,\sigma_{2h}^2)$`, and the two can be correlated, i.e., `$Cov(\upsilon_{1ht},\upsilon_{2ht}) \neq 0$`.

Point forecasts:
`$$\begin{aligned}
x_{1t+h|t} &= \phi_1 + \psi_{111} x_{1t} + \psi_{121} x_{2t} \\
x_{2t+h|t} &= \phi_2 + \psi_{211} x_{1t} + \psi_{221} x_{2t}
\end{aligned}$$`
]

---

# Direct multi-step ahead forecasts

.right-column[
Forecast errors: 
`$$\begin{aligned}
e_{1t+h} &= \upsilon_{1t+h} \\
e_{2t+h} &= \upsilon_{2t+h}
\end{aligned}$$`

Forecast variances:
`$$\begin{aligned}
\sigma_{1t+h}^2 &= \mathbb{E}(e_{1t+h}|\Omega_t)^2 = \mathbb{E}(\upsilon_{1t+h}^2) = \sigma_{1h}^2 \\
\sigma_{2t+h}^2 &= \mathbb{E}(e_{2t+h}|\Omega_t)^2 = \mathbb{E}(\upsilon_{2t+h}^2) = \sigma_{2h}^2
\end{aligned}$$`

Assuming normality, interval forecasts are obtained directly from these variances, the usual way.
]

---

# Out-of-Sample Granger Causality

.right-column[
The previously discussed (in sample) tests of causality in Granger sense are frequently performed in practice, but the 'true spirit' of such test is to assess the ability of a variable to help predict another variable in an out-of-sample setting.
]

---

# Out-of-Sample Granger Causality

.right-column[
Consider restricted and unrestricted information sets:
`$$\begin{aligned}
&\Omega_{t}^{(r)} \equiv \Omega_{t}(X_1) = \{x_{1,t},x_{1,t-1},\ldots\} \\
&\Omega_{t}^{(u)} \equiv \Omega_{t}(X_1,X_2) = \{x_{1,t},x_{1,t-1},\ldots,x_{2,t},x_{2,t-1},\ldots\}
\end{aligned}$$`

Following Granger's definition of causality: `$\{X_2\}$` is said to cause `$\{X_1\}$` if `$\sigma_{x_1}^2\left(\Omega_{t}^{(u)}\right) < \sigma_{x_1}^2\left(\Omega_{t}^{(r)}\right)$`, meaning that we can better predict `$X_1$` using all available information on `$X_1$` and `$X_2$`, rather than that on `$X_1$` only.
]

---

# Out-of-Sample Granger Causality

.right-column[
Let the forecasts based on each of the information sets be:
`$$\begin{aligned}
	&x_{1t+h|t}^{(r)} = E\left(x_{1t+h}|\Omega_{t}^{(r)}\right) \\
	&x_{1t+h|t}^{(u)} = E\left(x_{1t+h}|\Omega_{t}^{(u)}\right)
\end{aligned}$$`
]

---

# Out-of-Sample Granger Causality

.right-column[
For these forecasts, the corresponding forecast errors are:
`$$\begin{aligned}
	& e_{1t+h}^{(r)} = x_{1t+h} - x_{1t+h|t}^{(r)}\\
	& e_{1t+h}^{(u)} = x_{1t+h} - x_{1t+h|t}^{(u)}
\end{aligned}$$`

The out-of-sample forecast errors are then evaluated by comparing the loss functions based on these forecasts errors.

The out of sample Granger causality test is, in effect, a test of relative forecast accuracy between the two models.
]

---

# Accuracy Tests for Nested Models

.right-column[
In testing Out-of-sample Granger causality, we compare nested models.

For example, consider a bivariate VAR(1). A test of out-of-sample Granger causality involves comparing forecasts from these two models:
`$$\begin{aligned}
(A):~~x_{1t} &= \beta_0+\beta_{1}x_{1t-1}+\beta_{2}x_{2t-1}+\upsilon_t \\
(B):~~x_{1t} &= \alpha_0+\alpha_{1}x_{1t-1}+\varepsilon_t
\end{aligned}$$`

Obviously, here model (B) is nested in model (A). Under the null of Granger non-causality, the disturbances of the two models are identical. This leads to 'issues' in the usual tests. 
]

---

# Accuracy Tests for Nested Models

.right-column[
There is a way to circumvent these issues, which involves an adjustment of the loss differential.

In particular, the loss differential now becomes: `$$d(e_{t+h,ij})=e_{t+h,i}^2-e_{t+h,j}^2+(y_{t+h|t,i}-y_{t+h|t,j})^2,$$` where model `$i$` is nested in model `$j$`.

The testing procedure is otherwise similar to the Diebold-Mariano test.
]

---

# Key Takeaways

.pull-left[
![](Art/todolist.png)
]

.pull-right[
- Many economic time series are inter-related. We can model such series as a system of equations known as the Vector Autoregressive (VAR) model.
- Autoregressive Distributed Lag (ARDL) model is a special case (which is, in effect, a single equation) of a VAR model.
- We can use information on related variables to help better forecast a variable of interest. If improved accuracy is achieved, we have a case of Granger causality.
]