What is forecast if not a guess? An educated guess, nonetheless. A good forecast doesn’t need to be precise. It almost never is, only if by fluke. An imprecise forecast can be of immense value. That we were unable to exactly predict an event, tells us something about the underlying processes that result in an outcome different from what we predicted. Such conjecture can be useful. Forecasting, even if inaccurate, can be useful.1 George Box’s ‘all models are wrong but some are useful’ is certainly suitable for the study of forecasting.

Roots of forecasting extend very much to the beginning of human history. In their desire to predict the future, people have attempted to make forecasts of their own, or have used the services of others. This desire to guess what was to come, has been necessitated by potential benefits such information could offer.

For many centuries, because the weather was the single most important factor that impacted the livelihood of people and, indeed, the fate of civilizations,2 A sequence of droughts toward the end of the ninth century is considered one of the key reasons for the collapse of the Classic Mayan Civilization (Hodell, Curtis, and Brenner 1995). much of forecasting revolved around weather forecasting. Early attempts at weather forecasting were rather simplistic. The Babylonians, for example, based their weather forecasts on the appearance of clouds. Over time, an advancement in the studies of physics and related fields, on the one hand, and the invention of measuring instruments such as barometer and thermometer, on the other hand, contributed to the development of the study of meteorology, the way we know it. The birth of the modern weather forecast, however, is attributed to the invention of the telegraph, which made it possible for the weather forecast to arrive sooner than the weather itself.

Much like a better understanding of the laws of physics facilitated the inception of meteorological research, the development of the study of econometrics allowed for the introduction of the more rigorous forecasting methods. And as with the telegraph back in the 19th century, the development of the modern computer in the 20th century facilitated the effective use of econometric methods for economic forecasting. Toward the end of the 20th century, and particularly from the beginning of the 21st century, the evolution of the Internet and the massive increase in computing power allowed the storage and distribution of granular data that has further aided the advancement of the methods and practices of forecasting.

All methods—primitive or complex, spurious or scientifically substantiated—have one thing in common: they all rely (or, at least, pretend to rely) upon *information*. Information is key in forecasting. It comes in many forms, and is condensed into *data*. When organized and stored in a certain way – chronologically and at regular intervals – we end up with the *time series* data.

A diverse set of forecasting methods typically rely on insights from econometric analysis of time series. In time series analysis, the implied assumption is that the past tends to repeat itself, at least to some extent. So, if we well study the past, we may be able to forecast an event with some degree of accuracy.

A forecast is a random variable that has some distribution and, thus, moments. The simplest form of a forecast is a point forecast, which usually is the mean of the distribution.3 It can also be the median of the distribution, depending on a forecaster’s choice of the loss function; more about this later.

Let \(\hat{y}_{t+h|t}=E(y_{t+h}|\Omega_{t};\hat{\theta})\) be a point forecast4 We use ‘hat’ to emphasize that the forecast is based on parameter estimates rather than true parameters of the model. for period \(t+h\) made in period \(t\), that is, using using the available information at the time. The information set is denoted by \(\Omega_t\), which can contain any predictor, including the lags of the dependent variable.

Point forecast is our ‘best guess’ about the future realization of the random variable. The difference between this forecast and the actual realization of the random variable is the forecast error. That is, \[e_{t+h|t} = \hat{y}_{t+h|t} - y_{t+h}\]5 Somewhat unconventionally, I define the forecast error by subtracting the actual realization of the random variable from its forecast. I do so for intuitive convenience. A positive forecast error, here, suggests that we overestimated our forecast. Likewise, a negative forecast error would mean that we underestimated the forecast.

The more accurate is the forecast the smaller is the forecast error. There is no such thing as a perfect or the errorless forecast, even if by fluke we were to exactly predict an outcome of an event. There may be instances, if we are fortuitous, when a forecast is spot-on. But such an instance will be just that—a fluke. More often than not, a forecast error will be different from zero. Indeed, about half the time, forecast errors will be positive, and half the time—negative. And while on average the forecast error is expected to be zero, the forecast error variance is expected to have a positive value, thus implying uncertainty surrounding the point forecast.

Accurate forecasting is difficult because of all the unknowns we deal with in the process.6 Donald Rumsfeld, the U.S. Secretary of Defence, once famously said: ‘*As we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know.*’ In the context of model-based forecasting, these unknowns are: (i) that we don’t know the true model, so we assume one, which leads to the *model uncertainty*; (ii) that we don’t know the true parameters of the model, so we estimate those, which leads to the *parameter uncertainty*; and (iii) that we don’t know the future, and there is nothing we can do about it, which leads to the *information uncertainty*.

To illustrate the foregoing, we can decompose the forecast error into its three components: \[\begin{aligned} e_{t+h} & = \big[y_{t+h}-E(y_{t+h}|\Omega_{t})\big]\;~~\text{(information uncertainty)} \\ & + \big[E(y_{t+h}|\Omega_{t}) - E(y_{t+h}|\Omega_{t};\theta)\big]\;~~\text{(model uncertainty)} \\ & + \big[E(y_{t+h}|\Omega_{t};\theta)-E(y_{t+h}|\Omega_{t};\hat{\theta})\big]\;~~\text{(parameter uncertainty)} \end{aligned}\]

where \(E(y_{t+h}|\Omega_{t};\theta)\) indicates that the forecast is made using a parametric model, and \(E(y_{t+h}|\Omega_{t};\hat{\theta})\) indicates that the parameters of the model are estimated.

Because the information uncertainty cannot be avoided, we are bound to make a forecast error. The aim of a forecaster is to minimize the model and parameter uncertainties. That is, to find a model that closely approximates the true model, and to estimate it’s parameters as efficiently as possible. A forecaster can achieve this by minimizing the expected *loss function*.

A loss function, which we denote by \(L(e_{t+h})\), a transformation of a forecast error such that: (i) \(L(e_{t+h}) = 0\) for all \(e_{t+h}=0\); (ii) \(L(e_{t+h}) \geq 0\) for all \(e_{t+h} \neq 0\); and, when the loss function is symmetric about zero, (iii) \(L(e_{t+h}) > L(e_{s+h})\) for all \(|e_{t+h}| > |e_{s+h}|\).

Two commonly used symmetric7 There are also asymmetric loss functions, which are relevant in instances when it makes sense to ‘penalize’ the forecast error more so in one direction than another. loss functions are an *absolute* loss function, \(L{(e_{t+h|t})} = |e_{t+h|t}|\), and a *quadratic* loss function, \(L{(e_{t+h|t})} = (e_{t+h|t})^2\).

The quadratic loss function is arguably the most popular among the loss functions, partly because it echoes the way we fit the data ‘in-sample’ (i.e. by minimizing the sum of squared residuals).

Optimal forecast is one that minimizes the expected loss: \[\min_{\hat{y}_{t+h|t}} E\left[L\left(e_{t+h|t}\right)\right] = \min_{\hat{y}_{t+h|t}} E\left[L\left(y_{t+h}-\hat{y}_{t+h|t}\right)\right].\]

Assuming the quadratic loss function: \[\begin{aligned} E\left[L(e_{t+h|t})\right] & = E(e_{t+h|t}^2) = E(y_{t+h} - \hat{y}_{t+h|t})^2 \\ & = E(y_{t+h}^2)-2E(y_{t+h})\hat{y}_{t+h|t} + \hat{y}_{t+h|t}^2 \end{aligned}\]

By solving the optimization problem it follows that: \[\hat{y}_{t+h|t} = E(y_{t+h})\]

If we assume that the conditional density of a forecast is a normal density, then: \(E(y_{t+h}) = \mu_{t+h}\).

Thus, the optimal point forecast under the quadratic loss is the *mean* of the forecast distribution (for reference, the optimal point forecast under the absolute loss is the *median* of the distribution).

But some methods can yield more accurate forecasts, on average, than others. And in search of such methods, the study of time series econometrics has evolved.

Page built: 2022-11-30 using R version 4.1.2 (2021-11-01)