class: center, middle, inverse, title-slide .title[ # Forecasting for Economics and Business ] .subtitle[ ## Lecture 3: Forecasting Methods and Routines ] .author[ ### David Ubilava ] .date[ ### University of Sydney ] --- # Forecast is a random variable .pull-left[ ![](Art/forecast.png) ] .pull-right[ A forecast is a random variable which has some distribution and, thus, moments. A simplest form of a forecast is *point forecast*. A more complex but much more informative form of a forecast is *density forecast*. A relatively less complex form of a forecast is *interval forecast*, which is the lower and upper percentiles of the forecast distribution. ] --- # Forecast is a random variable .right-column[ Consider a forecast, which made in period `\(t\)` for a future period `\(t+h\)`, where `\(h\)` is forecast horizon. A complete `\(h\)`-step-ahead forecast, `\(y_{t+h|t}\)`, can be fully summarized by the (conditional) distribution `\(F(y_{t+h}|\Omega_t)\)` or the density `\(f(y_{t+h}|\Omega_t)\)`. Both `\(F(y_{t+h}|\Omega_t)\)` and `\(f(y_{t+h}|\Omega_t)\)` summarize all the knowns and unknowns about the potential values of `\(y\)` at time `\(t+h\)`, given the available knowledge at time `\(t\)`. ] --- # Point forecast .right-column[ Point forecast made in period `\(t\)` for horizon `\(h\)` is denoted as `\(y_{t+h|t}\)`. It is our 'best guess' that is made in period `\(t\)`, about the actual realization of the random variable in period `\(t+h\)`, denoted by `\(y_{t+h}\)`. The difference between the two is the forecast error. That is, `$$e_{t+h|t} = y_{t+h} - \hat{y}_{t+h|t}$$` 'Hat' is for estimated parameters in generating the forecast. The more accurate is the forecast the smaller is the forecast error. ] --- # Forecast uncertainty .right-column[ Three types of uncertainty contribute to the forecast error: `$$\begin{aligned} e_{t+h|t} & = \big[y_{t+h}-E(y_{t+h}|\Omega_{t})\big]\;~~\text{(information uncertainty)} \\ & + \big[E(y_{t+h}|\Omega_{t}) - E(y_{t+h}|\Omega_{t};\theta)\big]\;~~\text{(model uncertainty)} \\ & + \big[E(y_{t+h}|\Omega_{t};\theta)-E(y_{t+h}|\Omega_{t};\hat{\theta})\big]\;~~\text{(parameter uncertainty)} \end{aligned}$$` where `\(E(y_{t+h}|\Omega_{t};\hat{\theta})\equiv \hat{y}_{t+h|t}\)` ] --- # The loss function .right-column[ Because uncertainty cannot be avoided, a forecaster is bound to commit forecast errors. The goal of the forecaster is to minimize the 'cost' associated with the forecast errors. This is achieved by minimizing the expected loss function. ] --- # The loss function .right-column[ A loss function, `\(L(e_{t+h|t})\)`, can take many different forms, but is should satisfy the following properties: `$$\begin{aligned} & L(e_{t+h|t}) = 0,\;~~\forall\;e_{t+h|t} = 0 \\ & L(e_{t+h|t}) \geq 0,\;~~\forall\;e_{t+h|t} \neq 0 \\ & L(e_{t+h|t}^{(i)}) > L(e_{t+h|t}^{(j)}),\;~~\forall\;|e_{t+h|t}^{(i)}| > |e_{t+h|t}^{(j)}| \end{aligned}$$` ] --- # The loss function .right-column[ Two commonly used symmetric loss functions are *absolute* and *quadratic* loss functions: `$$\begin{aligned} & L{(e_{t+h|t})} = |e_{t+h|t}|\;~~\text{(absolute loss function)} \\ & L{(e_{t+h|t})} = (e_{t+h|t})^2\;~~\text{(quadratic loss function)} \end{aligned}$$` The quadratic loss function is popular, partly because we typically select models based on 'in-sample' quadratic loss (i.e. by minimizing the sum of squared residuals). ] --- # An optimal forecast .right-column[ Optimal forecast is the forecast that minimizes the expected loss: `$$\min_{\hat{y}_{t+h|t}} E\left[L\left(e_{t+h|t}\right)\right] = \min_{\hat{y}_{t+h|t}} E\left[L\left(y_{t+h}-\hat{y}_{t+h|t}\right)\right]$$` Under the assumption of the quadratic loss function: `$$E\left[L(e_{t+h|t})\right] = E(y_{t+h}^2)-2E(y_{t+h})\hat{y}_{t+h|t} + \hat{y}_{t+h|t}^2$$` ] --- # An optimal forecast .right-column[ By solving the optimization problem it follows that `$$\hat{y}_{t+h|t} = E(y_{t+h})$$`. Thus, under the quadratic loss, assuming that forecast is normally distributed: `$$\hat{y}_{t+h|t} = \mu_y$$` ] --- # Generating forecasts .right-column[ Forecast accuracy should only be determined by considering how well a model performs on data not used in estimation. But to assess forecast accuracy we need access to the data, typically from future time periods, that was not used in estimation. This leads to the so-called 'pseudo forecasting' routine. ] --- # Pseudo forecasting routine .right-column[ The pseudo forecasting routine involves slitting the available data into two segments referred to as 'in-sample' and 'out-of-sample' - The in-sample segment of a series is also known as the 'estimation set' or the 'training set.' - The out-of-sample segment of a series is also known as the 'hold-out set' or the 'test set.' ] --- # Pseudo forecasting routine .right-column[ Thus, we make the so-called 'genuine' forecasts using only the information from the estimation set, and assess the accuracy of these forecasts in an out-of-sample setting. Because forecasting is often performed in a time series context, the estimation set typically predates the hold-out set. In non-dynamic settings such chronological ordering may not be necessary, however. ] --- # Forecasting schemes .right-column[ Three forecasting schemes are available for a forecaster: *recursive*, *rolling*, and *fixed*. - The recursive forecasting environment uses a sequence of expanding windows to update model estimates and the information set. - The rolling forecasting environment uses a sequence of rolling windows of the same size to update model estimates and the information set. - The fixed forecasting environment uses one fixed window for model estimates, and only updates the information set. ] --- # Out-of-sample evaluation .right-column[ To evaluate forecasts of a time series, `\(\{y_t\}\)`, with a total of `\(T\)` observations, we divide the sample into two parts, the in-sample set with a total of `\(R\)` observations, such that `\(R < T\)` (typically, `\(R \approx 0.75T\)`), and the out-of-sample set. If we are interested in one-step-ahead forecast assessment, this way we will produce a sequence of forecasts: `\(\{y_{R+1|R},y_{R+2|{R+1}},\ldots,y_{T|{T-1}}\}\)` for `\(\{Y_{R+1},Y_{R+2},\ldots,Y_{T}\}\)`. Forecast errors, `\(e_{R+j} = y_{R+j} - y_{R+j|{R+j-1}}\)`, then can be computed for `\(j = 1,\ldots,T-R\)`. ] --- # Measures of forecast accuracy .right-column[ The most commonly applied accuracy measures are the mean absolute forecast error (MAFE) and the root mean squared forecast error (RMSFE): `$$\begin{aligned} \text{MAFE} = & \frac{1}{P}\sum_{i=1}^{P}|e_i|\\ \text{RMSFE} = & \sqrt{\frac{1}{P}\sum_{i=1}^{P}e_i^2} \end{aligned}$$` where `\(P\)` is the total number of out-of-sample forecasts. ] --- # Measures of forecast accuracy .right-column[ We must note that a model which fits the data well, may not necessarily forecast well. While perfect fit can always be achieved using a model with 'enough' parameters, such over-fitting can be seen akin to failing to identify the systematic pattern in the data. ] --- # Properties of a forecast error .right-column[ Forecast errors of a 'good' forecasting method will have the following properties: - zero mean; otherwise, the forecasts are biased. - no correlation with the forecasts; otherwise, there is information left that should be used in computing forecasts. - no serial correlation among one-step-ahead forecast errors. Note that `\(k\)`-step-ahead forecasts, for `\(k>1\)`, can be, and usually are, serially correlated. These properties, in effect, are testable hypotheses. ] --- # Forecast error diagnostics .right-column[ ## Unbiasedness Testing `\(E(e_{t+h|t})=0\)`. Set up a regression: `$$e_{t+h|t} = \alpha+\upsilon_{t+h} \hspace{.5in} t = R,\ldots,T-h,$$` where `\(R\)` is the estimation window size, `\(T\)` is the sample size, and `\(h\)` is the forecast horizon length. The null of zero-mean forecast error is equivalent of testing `\(H_0: \alpha = 0\)` in the OLS setting. For `\(h\)`-step-ahead forecast errors, when `\(h>1\)`, autocorrelation consistent standard errors should be used. ] --- # Forecast error diagnostics .right-column[ ## Efficiency Testing `\(Cov(e_{t+h|t},y_{t+h|t})=0\)`. Set up a regression: `$$e_{t+h|t} = \alpha + \beta y_{t+h|t} + \upsilon_{t+h} \hspace{.5in} t = R,\ldots,T-h.$$` The null of forecast error independence of the information set is equivalent of testing `\(H_0: \beta = 0\)` in the OLS setting. For `\(h\)`-step-ahead forecast errors, when `\(h>1\)`, autocorrelation consistent standard errors should be used. ] --- # Forecast error diagnostics .right-column[ ## No Autocorrelation Testing `\(Cov(e_{t+1|t},e_{t|t-1})=0\)`. Set up a regression: `$$e_{t+1|t} = \alpha + \gamma e_{t|t-1} + \upsilon_{t+1} \hspace{.5in} t = R+1,\ldots,T-1.$$` The null of no forecast error autocorrelation is equivalent of testing `\(H_0: \gamma = 0\)` in the OLS setting. ] --- # Readings .pull-left[ ![](Art/todolist.png) ] .pull-right[ Ubilava, [Chapter 3](https://davidubilava.com/forecasting/docs/features-of-time-series-data.html) Gonzalez-Rivera, Chapter 4 Hyndman & Athanasopoulos, [1.7](https://otexts.com/fpp3/perspective.html), [5.4](https://otexts.com/fpp3/diagnostics.html), [5.8](https://otexts.com/fpp3/accuracy.html) ]