Linear Predictors

Forecasting

Suppose we are interested in forecasting a random variable \(\smash{Y_{t+1}}\) based on a set of variables \(\smash{\boldsymbol{X}_t}\).

  • \(\smash{\boldsymbol{X}_t}\) might be comprised of \(\smash{m}\) lags of \(\smash{Y_{t+1}}\): \(\smash{Y_t, Y_{t-1}, \ldots, Y_{t-m+1}}\).
  • We can denote \(\smash{Y^*_{t+1|t}}\) as the forecast of \(\smash{Y_{t+1}}\) based on \(\smash{\boldsymbol{X}_t}\).
  • We can choose \(\smash{Y^*_{t+1|t}}\) to minimize some loss function, \(\smash{L\left(Y^*_{t+1|t}\right)}\), which evaluates the quality of \(\smash{Y^*_{t+1|t}}\).
  • A common choice is the quadratic loss function:
\[\begin{align*} L\left(Y^*_{t+1|t}\right) & = \text{E}\left[\left(Y_{t+1} - Y^*_{t+1|t}\right)^2\right]. \end{align*}\]

Mean Squared Error Loss

Quadratic loss is also known as mean squared error.

\[\begin{align*} MSE\left(Y^*_{t+1|t}\right) & = \text{E}\left[\left(Y_{t+1} - Y^*_{t+1|t}\right)^2\right]. \end{align*}\]
  • The conditional expectation, \(\smash{\text{E}\left[Y_{t+1}|\boldsymbol{X}_t \right]}\) minimizes \(\smash{MSE\left(Y^*_{t+1|t}\right)}\).

MSE Minimizer

Let \(\smash{Y^*_{t+1|t} = g(\boldsymbol{X}_t)}\). Then

\[\begin{split}\begin{align*} \text{E}\left[\left(Y_{t+1} - g(\boldsymbol{X}_t)\right)^2\right] & = \text{E}\Big[\big(Y_{t+1} - \text{E}[Y_{t+1}|\boldsymbol{X}_t] \\ & \hspace{1in} + \text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t) \big)^2\Big] \\ & = \text{E}\left[\left(Y_{t+1} - \text{E}[Y_{t+1}|\boldsymbol{X}_t]\right)^2\right] \\ & \hspace{0.25in} + 2\text{E}\Big[\big(Y_{t+1}-\text{E}[Y_{t+1}|\boldsymbol{X}_t]\big) \\ & \hspace{0.75in} \times \big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big)\Big] \\ & \hspace{1in} + \text{E}\left[\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big)^2\right] \end{align*}\end{split}\]

MSE Minimizer

By the law of iterated expectations

\[\begin{split}\begin{align*} \text{E}\Big[&\big(Y_{t+1}-\text{E}[Y_{t+1}|\boldsymbol{X}_t]\big) \big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big)\Big] \\ & \hspace{0.25in} = \text{E}\Big[ \text{E}\big[\left(Y_{t+1} - \text{E}[Y_{t+1}|\boldsymbol{X}_t]\right) \big| \boldsymbol{X}_t\big] \big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big) \Big] \\ & \hspace{0.25in} = \text{E}\Big[ \left(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - \text{E}[Y_{t+1}|\boldsymbol{X}_t]\right) \big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big) \Big] \\ & \hspace{0.25in} = 0. \end{align*}\end{split}\]
  • This means that the second term of the equation on the previous slide is zero.

MSE Minimizer

Substituting the previous result:

\[\begin{split}\begin{align*} \text{E}\left[\left(Y_{t+1} - g(\boldsymbol{X}_t)\right)^2\right] & = \text{E}\left[\left(Y_{t+1} - \text{E}[Y_{t+1}|\boldsymbol{X}_t]\right)^2\right] \\ & \hspace{0.5in} + \text{E}\left[\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big)^2\right] \end{align*}\end{split}\]
  • Clearly the the \(\smash{MSE}\) is minimized when
\[\smash{\text{E}\left[\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big)^2\right] = 0.}\]
  • This occurs when \(\smash{\text{E}[Y_{t+1}|\boldsymbol{X}_t] = g(\boldsymbol{X}_t)}\).

Linear Projection

We can restrict our forecast to be a linear function of \(\smash{\boldsymbol{X}_t}\):

\[\begin{align*} Y^*_{t+1|t} & = \boldsymbol{X}'_t \boldsymbol{\beta}. \end{align*}\]
  • Let \(\smash{\boldsymbol{\beta}^*}\) be the value of \(\smash{\boldsymbol{\beta}}\) so that the forecast error is orthogonal to or uncorrelated with \(\smash{\boldsymbol{X}_t}\):
\[\begin{align*} \text{E}\Big[\boldsymbol{X}_t \underbrace{\left(Y_{t+1} - \boldsymbol{X}'_t \boldsymbol{\beta}^*\right)}_{\text{forecast error}} \Big] & = \boldsymbol{0}. \end{align*}\]
  • This is a system of equations.
  • \(\smash{\boldsymbol{\beta}^*}\) minimizes the \(\smash{MSE}\).

Linear Projection

We can use the same steps as before to show that \(\smash{\boldsymbol{\beta}^*}\) minimizes \(\smash{MSE}\).

  • Begin with an arbitrary linear forecasting rule, \(\smash{Y^*_{t+1|t} = \boldsymbol{X}'_t \boldsymbol{\gamma}}\).
  • Show that
\[\begin{split}\begin{align*} MSE\left( Y^*_{t+1|t} \right) & = \text{E}\left[\left(Y_{t+1} - \boldsymbol{X}'_t \boldsymbol{\gamma} \right)^2\right] \\ & = \text{E}\left[\left(Y_{t+1} - \boldsymbol{X}'_t \boldsymbol{\beta}^* + \boldsymbol{X}'_t \boldsymbol{\beta}^* - \boldsymbol{X}'_t \boldsymbol{\gamma} \right)^2\right] \\ & = \text{E}\Big[\big(Y_{t+1} - \boldsymbol{X}'_t \boldsymbol{\beta}^*\big)^2\Big] + \text{E}\left[\left(\boldsymbol{X}'_t \boldsymbol{\beta}^* - \boldsymbol{X}'_t \boldsymbol{\gamma}\right)^2\right]. \end{align*}\end{split}\]
  • Hence, \(\smash{MSE}\) is minimized when \(\smash{\boldsymbol{\gamma} = \boldsymbol{\beta}^*}\).

Linear Projection

\(\smash{Y^*_{t+1|t} = \boldsymbol{X}'_t \boldsymbol{\beta}^*}\) is referred to as the linear projection of \(\smash{Y_{t+1}}\) on \(\smash{\boldsymbol{X}_t}\).

  • We will denote the linear projection as
\[\begin{equation*} \hat{P}(Y_{t+1}|\boldsymbol{X}_t) = \boldsymbol{X}'_t \boldsymbol{\beta}^* \,\,\,\,\,\,\, \text{or} \,\,\,\,\,\,\, \hat{Y}_{t+1|t} = \boldsymbol{X}'_t \boldsymbol{\beta}^*. \end{equation*}\]
  • Clearly
\[\begin{equation*} MSE\left(\hat{P}(Y_{t+1}|\boldsymbol{X}_t)\right) \geq MSE\left(\text{E}[Y_{t+1}|\boldsymbol{X}_t]\right). \end{equation*}\]

Linear Projection Solution

Using the orthogonality condition:

\[\begin{align} \boldsymbol{\beta}^* & = \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t Y_{t+1}\right]. \end{align}\]
  • Least squares projection is the sample analogue of the equation above.

Linear Projection MSE

Using our solution for \(\smash{\boldsymbol{\beta}^*}\), we can solve for the MSE of the linear projection:

\[\begin{split}\begin{align*} MSE(Y^*_{t+1|t}) & = \text{E}\left[\left(Y_{t+1}- \boldsymbol{X}'_t \boldsymbol{\beta}^*\right)^2\right] \\ & = \text{E}[Y^2_{t+1}] - 2\text{E}[Y_{t+1} \boldsymbol{X}'_t \boldsymbol{\beta}^*] + \text{E}[\boldsymbol{\beta}^{*'} \boldsymbol{X}_t \boldsymbol{X}'_t \boldsymbol{\beta}^*] \\ & = \text{E}[Y^2_{t+1}] - 2\text{E}[Y_{t+1} \boldsymbol{X}'_t] \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t Y_{t+1}\right] \\ & \hspace{0.85in} + \text{E}\left[Y_{t+1} \boldsymbol{X}'_t\right] \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1} \text{E}[\boldsymbol{X}_t \boldsymbol{X}'_t] \\ & \hspace{1.6in} \times \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t Y_{t+1}\right] \\ & = \text{E}[Y^2_{t+1}] - \text{E}[Y_{t+1} \boldsymbol{X}'_t] \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t Y_{t+1}\right]. \end{align*}\end{split}\]

Vector Linear Projection

Let \(\smash{\boldsymbol{Y}_{t+1}}\) be an \(\smash{(n \times 1)}\) vector and \(\smash{\boldsymbol{X}_t}\) an \(\smash{(m \times 1)}\) vector.

  • The linear projection of \(\smash{\boldsymbol{Y}_{t+1}}\) on \(\smash{\boldsymbol{X}_t}\) is
\[\begin{align*} \hat{P}(\boldsymbol{Y}'_{t+1}|\boldsymbol{X}_t) = \hat{\boldsymbol{Y}}'_{t+1|t} = \boldsymbol{X}'_t \boldsymbol{\beta}^*. \end{align*}\]

where \(\smash{\boldsymbol{\beta}^*}\) is the \(\smash{(m \times n)}\) matrix such that

\[\begin{align*} \text{E}\Big[\boldsymbol{X}_t \left(\boldsymbol{Y}'_{t+1} - \boldsymbol{X}'_t \boldsymbol{\beta}^*\right)\Big] & = \boldsymbol{0}. \end{align*}\]
  • As in the univariate case
\[\begin{align*} \boldsymbol{\beta}^* & = \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t \boldsymbol{Y}'_{t+1}\right]. \end{align*}\]