Chapter 10 – Vector Autoregression

10.1 Dynamic Feedbacks Among Economic Variables

Economic variables are interrelated. For example, changes to household income impact their consumption levels; changes to interest rates impact investments in the economy. Often (albeit not always) the relationship between the variables goes in both directions. For example, growth in income results in higher prices (inflation), which in turn puts an upward pressure on wages.21

The foregoing implies that a shock to a variable may propagate a dynamic response not only of that variable, but also of related variables. The dynamic linkages between two (or more) economic variables can be modeled as a system of equations, represented by a vector autoregression (VAR).

There are three general forms of vector autoregressions: structural, recursive, and reduced-form. The structural VAR uses economic theory to impose the ‘structure’ on correlations of the error terms in the system. The recursive VAR also introduces a structure of some sort, which primarily involves ordering the equations in the system in a specific way so that the error terms in each equation are uncorrelated with those in the preceding equations. To the extent that the ‘identifying assumptions’ are satisfied, some contemporaneous values (of other variables) appear in the equation of a given variable.

The reduced-form VAR makes no claims of causality, at least not in the sense that this term is usually understood. The equations include only the lagged values of all the variables in the system. To the extent that these variables are, indeed, correlated with each other, the error terms of the reduced-form VAR (typically) are contemporaneously correlated. In what follows, VAR will refer to the reduced-form VAR, unless otherwise stated.

10.2 Modeling

The simplest form of the VAR is a bivariate (two-dimensional) VAR of order one, VAR(1).

Let \(\{X_{1,t}\}\) and \(\{X_{2,t}\}\) be the stationary stochastic processes. A bivariate VAR(1), is then given by: \[\begin{aligned} x_{1t} &= \alpha_1 + \pi_{11}x_{1t-1} + \pi_{12}x_{2t-1} + \varepsilon_{1t} \\ x_{2t} &= \alpha_2 + \pi_{21}x_{1t-1} + \pi_{22}x_{2t-1} + \varepsilon_{2t} \end{aligned}\]

where \(\varepsilon_{1,t} \sim iid(0,\sigma_1^2)\) and \(\varepsilon_{2,t} \sim iid(0,\sigma_2^2)\), and the two can be correlated, i.e., \(Cov(\varepsilon_{1,t},\varepsilon_{2,t}) \neq 0\).

To generalize, consider a multivariate (\(n\)-dimensional) VAR of order \(p\), VAR(p), presented in matrix notation: \[\mathbf{x}_t = \mathbf{\alpha} + \Pi^{(1)} \mathbf{x}_{t-1} + \ldots + \Pi^{(p)} \mathbf{x}_{t-p} + \mathbf{\varepsilon}_t,\] where \(\mathbf{x}_t = (x_{1t},\ldots,x_{nt})'\) is a vector of \(n\) (potentially) related variables; \(\mathbf{\varepsilon}_t = (\varepsilon_{1t},\ldots,\varepsilon_{nt})'\) is a vector of error terms, such that \(E(\mathbf{\varepsilon}_t) = \mathbf{0}\), \(E(\mathbf{\varepsilon}_t^{}\mathbf{\varepsilon}_t^{\prime}) = \Sigma_{\mathbf{\varepsilon}}\), and \(E(\mathbf{\varepsilon}_{t}^{}\mathbf{\varepsilon}_{s \neq t}^{\prime}) = 0\). \(\Pi^{(1)},\ldots,\Pi^{(p)}\) are \(n\)-dimensional parameter matrices: \[\Pi^{(j)} = \left[ \begin{array}{cccc} \pi_{11}^{(j)} & \pi_{12}^{(j)} & \cdots & \pi_{1n}^{(j)} \\ \pi_{21}^{(j)} & \pi_{22}^{(j)} & \cdots & \pi_{2n}^{(j)} \\ \vdots & \vdots & \ddots & \vdots \\ \pi_{n1}^{(j)} & \pi_{n2}^{(j)} & \cdots & \pi_{nn}^{(j)} \end{array} \right],\;~~j=1,\ldots,p\]

When two or more variables are modeled in this way, the implies assumption is that these variables are endogenous to each other; that is, each of the variables affects and is affected by other variables.

Some of the features of the VAR are that:

  • only lagged values of the dependent variables are considered as the right-hand-side variables (although, trend and seasonal variables might also be included in higher-frequency data analysis);
  • each equation has the same set of right-hand-side variables (however, it is possible to impose a different lag structure across the equations, especially when \(p\) is relatively large, to preserve the degrees of freedom, particularly when the sample size is relatively small and when there are several variables in the system).
  • The autregressive order of the VAR, \(p\), is determined by the maximum lag of a variable across all equations.

The order of a VAR, \(p\), can be determined using system-wide information criteria:

\[\begin{aligned} & AIC = \ln\left|\Sigma_{\varepsilon}\right| + \frac{2}{T}(pn^2+n) \\ & SIC = \ln\left|\Sigma_{\varepsilon}\right| + \frac{\ln{T}}{T}(pn^2+n) \end{aligned}\]

where \(\left|\Sigma_{\varepsilon}\right|\) is the determinant of the residual covariance matrix; \(n\) is the number of equations, and \(T\) is the effective sample size.

We can estimate each equation of the VAR individually using OLS.

10.2.1 In-Sample Granger Causality

A test of joint significance of parameters associated with all the lags of a variable entering the equation of another variable is known as the ‘Granger causality’ test. The use of the term ‘causality’ in this context has been criticized. That one variable can help explain the movement of another variable does not necessarily mean that the former causes the latter. To that end the use of the term can be seen as misleading, indeed. Rather, causality in Granger sense simply means that the former helps predict the latter.

To illustrate the testing framework, consider a bivariate VAR(p): \[\begin{aligned} x_{1t} &= \alpha_1 + \pi_{11}^{(1)} x_{1t-1} + \cdots + \pi_{11}^{(p)} x_{1t-p} \\ &+ \pi_{12}^{(1)} x_{2t-1} + \cdots + \pi_{12}^{(p)} x_{2,t-p} +\varepsilon_{1t} \\ x_{2t} &= \alpha_1 + \pi_{21}^{(1)} x_{1t-1} + \cdots + \pi_{21}^{(p)} x_{1t-p} \\ &+ \pi_{22}^{(1)} x_{2t-1} + \cdots + \pi_{22}^{(p)} x_{2t-p} +\varepsilon_{2t} \end{aligned}\]

We say that:

  • \(\{X_2\}\) does not Granger cause \(\{X_1\}\) if \(\pi_{121}=\cdots=\pi_{12p}=0\)
  • \(\{X_1\}\) does not Granger cause \(\{X_2\}\) if \(\pi_{211}=\cdots=\pi_{21p}=0\)

So long as the variables of the system are covariance-stationarity, we can test the hypothesis using a F test. If \(p=1\), we can also equivalently test the hypothesis using a t test.

10.3 Forecasting

Generating forecasts for each of the variable comprising the VAR(p) model can be a straightforward exercise, so long as we have access to the relevant information set. As was the case with autoregressive models, we make one-step-ahead forecasts based on the readily available data; and we make multi-step-ahead forecasts iteratively, using the forecast in periods for which the data are not present.

10.3.1 One-step-ahead forecasts

In a VAR(p), the realization of a dependent variable \(i\) in period \(t+1\) is: \[x_{it+1} = \alpha_i + \sum_{j=1}^{p}\sum_{k=1}^{n}\pi_{ik}^{(j)} x_{kt+1-j} + \varepsilon_{t+1},\;~~i=1,\ldots,n.\]

Point forecast, ignoring parameter uncertainty, is: \[x_{it+1|t} = \alpha_i + \sum_{j=1}^{p}\sum_{k=1}^{n}\pi_{ik}^{(j)} x_{kt+1-j},\;~~i=1,\ldots,n.\]

Forecast error is: \[e_{it+1} = x_{it+1}-x_{it+1|t}=\varepsilon_{it+1},\;~~i=1,\ldots,n.\]

Forecast variance is: \[\sigma_{it+1}^2 = E(\varepsilon_{it+1}^2) = \sigma_{i}^2,\;~~i=1,\ldots,n,\] where \(\sigma_{i}^2\) is the i\(^{th}\) element of the main diagonal of the error variance-covariance matrix of the VAR(p).

The (95%) interval forecast is: \[x_{it+1|t}\pm1.96\sigma_{i}^2,\;~~i=1,\ldots,n.\]

10.3.2 Multi-step-ahead forecasts

The realization of a dependent variables \(i\) in period \(t+h\) is: \[x_{it+h} = \alpha_i + \sum_{j=1}^{p}\sum_{k=1}^{n}\pi_{ik}^{(j)} x_{kt+h-j} + \varepsilon_{t+h},\;~~i=1,\ldots,n.\]

Point forecast is: \[\mathbf{x}_{t+h|t} = \alpha_i + \sum_{j=1}^{p}\sum_{k=1}^{n}\pi_{ik}^{(j)} x_{kt+h-j|t},\;~~i=1,\ldots,n.\] where the iterative method is applied to generate forecasts, and where \(x_{kt+h-j|t}=x_{t+h-j}\) if \(j\ge h\).

Forecast error is: \[e_{it+h} = x_{it+h}-x_{t+h|t}=\sum_{j=1}^{p}\sum_{k=1}^{n}\pi_{ik}^{(j)} e_{kt+h-j} + \varepsilon_{t+h},\;~~i=1,\ldots,n.\] Note that the forecast error reduces to \(\varepsilon_{t+h}\), when \(h=1\), which is what we saw above. More intriguingly, observe that the multi-step-ahead forecast error of a given variable is the function of preceding forecast errors of all variables in the system, each multiplied by some parameter of the model.

Forecast variance, \(\sigma_{it+h}^2\), is the function of error variances and covariances, and the model parameters.

The (95%) interval forecast is: \[x_{it+h|t}\pm1.96\sigma_{it+h},\;~~i=1,\ldots,n.\]

10.3.3 Out-of-Sample Granger Causality

The previously discussed (in sample) test of causality in Granger sense is frequently performed in practice. But as noted above, the term ‘causality’ may be misleading somewhat. Indeed, the ‘true spirit’ of such test is to assess the ability of a variable to help predict another variable in an out-of-sample setting.

Consider a set of variables, \(\{X_1\},\ldots,\{X_n\}\). Suppose there are two information sets, the unrestricted information set, \(\Omega_{t}^{(u)}\), and the restricted information set, \(\Omega_{t}^{(r)}\). The former consists of the current and lagged values of all the variables in the set. The latter consists of the current and lagged values of all but one variables in the set. Suppose the omitted variable is \(\{X_1\}\). Then, following Granger’s definition of causality: \(\{X_1\}\) is said to cause \(\{X_{i\ne 1}\}\) if \(\sigma_{i}^2\left(\Omega_{t}^{(u)}\right) < \sigma_{i}^2\left(\Omega_{t}^{(r)}\right)\), meaning that we can better predict \(X_i\) using the available information on \(\{X_1\}\) through \(\{X_n\]\), rather than that on \(\{X_2\}\) through \(\{X_n\}\) only.

In practice, the test involves generating two sets of (one-step-ahead) out-of-sample forecasts from the unrestricted and restricted equations of the vector autoregression. The former is simply the forecast as presented above, that is: \[x_{it+1|t}^{(u)} = \alpha_i + \sum_{j=1}^{p}\sum_{k=1}^{n}\pi_{ik}^{(j)} x_{kt+1-j},\;~~i=2,\ldots,n.\] The latter is the forecast that doesn’t rely on information from the omitted variable (in our example, the first variable in the unrestricted system of equations): \[x_{it+1|t}^{(r)} = \tilde{\alpha}_i + \sum_{j=1}^{p}\sum_{k=2}^{n}\tilde{\pi}_{ik}^{(j)} x_{kt+1-j},\;~~i=2,\ldots,n.\]

For these forecasts, the corresponding forecast errors are: \[\begin{aligned} & e_{it+1}^{(u)} = x_{1t+1} - x_{1t+1|t}^{(u)}\\ & e_{it+1}^{(r)} = x_{1t+1} - x_{1t+1|t}^{(r)} \end{aligned}\]

The out-of-sample forecast errors are then evaluated by comparing the loss functions based on these forecasts errors. For example, assuming quadratic loss, and \(P\) out-of-sample forecasts: \[\begin{aligned} RMSFE^{(u)} &= \sqrt{\frac{1}{P}\sum_{s=1}^{P}\left(e_{iR+s|R-1+s}^{(u)}\right)^2} \\ RMSFE^{(r)} &= \sqrt{\frac{1}{P}\sum_{s=1}^{P}\left(e_{iR+s|R-1+s}^{(r)}\right)^2} \end{aligned}\] where \(R\) is the size of the (first) estimation window.

\(\{X_1\}\) is said to cause in Granger sense \(\{X_{i\ne 1}\}\) if \(RMSFE^{(u)} < RMSFE^{(r)}\).


  1. The sequence of events in Australia, following the Covid-19 pandemic, is a notable case in point. Fearing economic crisis, the Australian government issued a suite of stimulus packages. A resulting increase in demand became one of the contributing factors to the subsequent price inflation. At that point, the Reserve Bank of Australia was left with no option but to hike interest rates to slow down the economy and, thus, release the inflationary pressure.↩︎