Chapter 6 – Combining Forecasts

6.1 Benefits of Forecast Combination

Every model yields forecasts that are inaccurate in their own way. Taken together, two or more models can each contribute to an accurate forecast, on average. As the joke goes: a mathematician, a physicist, and a statistician went hunting. They stumbled upon a deer. The mathematician took a shot first but missed the target to the left by a metre. The physicist gave a shot next and missed the mark to the right by a metre. Having observed this, the statistician exclaimed: “We got it!”

The joke works because the first two scientists missed the target by the same distance and in an opposite direction. Had the mathematician, instead, missed the target by a few metres, or had the physicist, like the mathematician, missed the target to the right, the statistician would have been less excited about the expected outcome.

In accord with this analogy, the strategy of combining forecast to achieve a superior accuracy will best work if the models generate different enough forecasts with relatively similar expected loss functions.

Intuitively, a combined forecast aggregates more information or more ways of processing the information.

Practically, a method of forecast combination is akin to managing portfolio risk.

6.2 Optimal Weights for Forecast Combination

Forecast combination implies mixing the forecasts from different models with some proportions. In other words, a combined forecast is a weighted sum of the individual forecasts. The weights need to be selected in a way that maximizes the combined forecast accuracy, i.e., minimizes the combined forecast loss.

Consider two forecasting methods (or models), \(a\) and \(b\), each respectively yielding one-step-ahead forecasts \(y_{a,t+1|t}\) and \(y_{b,t+1|t}\), and the associated forecast errors \(e_{a,t+1} = y_{t+1}-y_{a,t+1|t}\) and \(e_{b,t+1} = y_{t+1}-y_{b,t+1|t}\).

A combined forecast, \(y_{c,t+1|t}\), is the weighted sum of these two forecasts: \[y_{c,t+1|t} = (1-w)y_{a,t+1|t} + w y_{b,t+1|t},\] where the usual assumption about the weights apply, i.e., \(0 \leq w \leq 1\).¹⁷

A combined forecast error is the weighted sum of the individual forecast errors: \[e_{c,t+1} = (1-w)e_{a,t+1} + w e_{b,t+1}\]

The combined forecast error is, on average, zero (assuming that the individual forecasts are all unbiased): \[E\left(e_{c,t+1}\right) = E\left[(1-w)e_{a,t+1} + w e_{b,t+1}\right] = 0\]

The variance of a combined forecast error is: \[Var\left(e_{c,t+1}\right) = (1-w)^2 \sigma_a^2 + w^2 \sigma_b^2 + 2w(1-w)\sigma_{ab},\] where \(\sigma_a\) and \(\sigma_b\) are the standard deviations of the forecast errors from models \(a\) and \(b\), and \(\sigma_{ab}=\rho_{ab}\sigma_a\sigma_b\) is the covariance of the two forecast errors, which can also be expressed as a product of the correlation between the two forecast errors, \(\rho_{ab}\), and the standard deviations of each forecast error.

By taking the derivative of the combined forecast error variance, and equating it to zero, we obtain the optimal weight (which minimizes the combined forecast error variance): \[w^* = \frac{\sigma_a^2-\sigma_{ab}}{\sigma_a^2+\sigma_b^2-2\sigma_{ab}}\]

A couple of features of interest become immediately apparent. First, when the two forecast errors are uncorrelated, the optimal weight assigned to the individual forecast is inversely proportional to the forecast error variance. Second, when the two forecast errors have the same variance, the optimal weight assigned to each individual forecast is equal to 0.5.

Substitute \(w^*\) in place of \(w\) in the formula for variance to obtain: \[Var\left[e_{c,t+1}(w^*)\right] = \sigma_c^2(w^*) = \frac{\sigma_a^2\sigma_b^2(1-\rho^2)}{\sigma_a^2+\sigma_b^2-2\rho\sigma_a\sigma_b}\]

It can be shown that \(\sigma_c^2(w^*) \leq \min\{\sigma_a^2,\sigma_b^2\}\). That is to say that by combining forecasts we are not making things worse (so long as we use optimal weights).

In practice \(\sigma_a\), \(\sigma_b\), and \(sigma_{ab}\) are unknown. So, we estimate these using a pseudo-forecasting routine. Specifically, the sample estimator of \(w^*\) is: \[\hat{w}^* = \frac{\hat{\sigma}_i^2-\hat{\sigma}_{ij}}{\hat{\sigma}_i^2+\hat{\sigma}_j^2-2\hat{\sigma}_{ij}},\] where \(\hat{\sigma}_i^2 = \frac{1}{P-1}\sum_{t=R}^{R+P-1}{e_{a,t+1}^2}\) and \(\hat{\sigma}_j^2 = \frac{1}{P-1}\sum_{t=R}^{R+P-1}{e_{b,t+1}^2}\) are sample forecast error variances, and \(\hat{\sigma}_{ij}=\frac{1}{P-1}\sum_{t=R}^{R+P-1}{e_{a,t+1}e_{b,t+1}}\) is a sample forecast error covariance, where \(P\) denotes the number of out-of-sample forecasts.

The optimal weight has a straightforward interpretation in a regression setting. Consider the combined forecast equation as: \[y_{t+1} = (1-w)y_{a,t+1|t} + w y_{b,t+1|t} + \varepsilon_{t+1},\] where \(\varepsilon_{t+1}\equiv e_{c,t+1}\). We can re-arrange the equation so that: \[e_{a,t+1} = w (y_{b,t+1|t}-y_{a,t+1|t}) + \varepsilon_{t+1},\] where \(w\) is obtained by estimating a linear regression with an intercept restricted to zero.

Alternatively, we can estimate an unconstrained variant of the combined forecast equation: \[y_{t+1} = \alpha+\beta_a y_{a,t+1|t} + \beta_b y_{b,t+1|t} + \varepsilon_{t+1},\] which relaxes the assumption that the forecasts are unbiased, as well as that the weights need to add up to one. Indeed, this unconstrained variant for forecast combination allows for the possibility of negative weights.

6.3 Forecast Encompassing

A special case of forecast combination is when \(w=0\). Such an outcome (of the optimal weights) is known as forecast encompassing.

It is said that \(y_{a,t+1|t}\) encompasses \(y_{b,t+1|t}\), when given that the former is available, the latter provides no additional useful information.

This is equivalent of testing the null hypothesis of \(w=0\) in the combined forecast error equation, which, after rearranging terms, yields the following regression: \[e_{a,t+1} = w\left(e_{a,t+1}-e_{b,t+1}\right)+\varepsilon_{t+1},\;~~t=R,\ldots,R+P-1\] where \(\varepsilon_{t+1}\equiv e_{c,t+1}\), and where \(R\) is the size of the (first) estimation window, and \(P\) is the number of out-of-sample forecasts generated.

We can test for the forecast encompassing by regressing the realized value on individual forecasts: \[y_{t+1} = \alpha + \beta_1 y_{a,t+1|t} + \beta_2 y_{b,t+1|t} + \varepsilon_{t+1},\] and testing the null hypothesis that \(\beta_2=0\), given that \(\beta_1=1\).

More broadly, if we have \(n\) forecasts that we intend to combine, the associated weights, \(w_i:i=1,\ldots,n\), will each be bounded by zero and one, and together they will add up to one.↩︎