Linear Predictors
Forecasting
Suppose we are interested in forecasting a random variable
\(\smash{Y_{t+1}}\) based on a set of variables
\(\smash{\boldsymbol{X}_t}\).
- \(\smash{\boldsymbol{X}_t}\) might be comprised of
\(\smash{m}\) lags of \(\smash{Y_{t+1}}\):
\(\smash{Y_t, Y_{t-1}, \ldots, Y_{t-m+1}}\).
- We can denote \(\smash{Y^*_{t+1|t}}\) as the forecast of
\(\smash{Y_{t+1}}\) based on \(\smash{\boldsymbol{X}_t}\).
- We can choose \(\smash{Y^*_{t+1|t}}\) to minimize some loss
function, \(\smash{L\left(Y^*_{t+1|t}\right)}\), which evaluates
the quality of \(\smash{Y^*_{t+1|t}}\).
- A common choice is the quadratic loss function:
\[\begin{align*}
L\left(Y^*_{t+1|t}\right) & = \text{E}\left[\left(Y_{t+1} -
Y^*_{t+1|t}\right)^2\right].
\end{align*}\]
Mean Squared Error Loss
Quadratic loss is also known as mean squared error.
\[\begin{align*}
MSE\left(Y^*_{t+1|t}\right) & = \text{E}\left[\left(Y_{t+1} -
Y^*_{t+1|t}\right)^2\right].
\end{align*}\]
- The conditional expectation,
\(\smash{\text{E}\left[Y_{t+1}|\boldsymbol{X}_t \right]}\)
minimizes \(\smash{MSE\left(Y^*_{t+1|t}\right)}\).
MSE Minimizer
Let \(\smash{Y^*_{t+1|t} = g(\boldsymbol{X}_t)}\). Then
\[\begin{split}\begin{align*}
\text{E}\left[\left(Y_{t+1} -
g(\boldsymbol{X}_t)\right)^2\right] & = \text{E}\Big[\big(Y_{t+1} -
\text{E}[Y_{t+1}|\boldsymbol{X}_t] \\
& \hspace{1in} + \text{E}[Y_{t+1}|\boldsymbol{X}_t] -
g(\boldsymbol{X}_t) \big)^2\Big] \\
& = \text{E}\left[\left(Y_{t+1} -
\text{E}[Y_{t+1}|\boldsymbol{X}_t]\right)^2\right] \\
& \hspace{0.25in} +
2\text{E}\Big[\big(Y_{t+1}-\text{E}[Y_{t+1}|\boldsymbol{X}_t]\big) \\
& \hspace{0.75in} \times
\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big)\Big] \\
& \hspace{1in} +
\text{E}\left[\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] -
g(\boldsymbol{X}_t)\big)^2\right]
\end{align*}\end{split}\]
MSE Minimizer
By the law of iterated expectations
\[\begin{split}\begin{align*}
\text{E}\Big[&\big(Y_{t+1}-\text{E}[Y_{t+1}|\boldsymbol{X}_t]\big)
\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big)\Big] \\
& \hspace{0.25in} = \text{E}\Big[ \text{E}\big[\left(Y_{t+1} -
\text{E}[Y_{t+1}|\boldsymbol{X}_t]\right) \big| \boldsymbol{X}_t\big]
\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big) \Big] \\
& \hspace{0.25in} = \text{E}\Big[
\left(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - \text{E}[Y_{t+1}|\boldsymbol{X}_t]\right)
\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] - g(\boldsymbol{X}_t)\big) \Big] \\
& \hspace{0.25in} = 0.
\end{align*}\end{split}\]
- This means that the second term of the equation on the previous
slide is zero.
MSE Minimizer
Substituting the previous result:
\[\begin{split}\begin{align*}
\text{E}\left[\left(Y_{t+1} - g(\boldsymbol{X}_t)\right)^2\right] & =
\text{E}\left[\left(Y_{t+1} - \text{E}[Y_{t+1}|\boldsymbol{X}_t]\right)^2\right]
\\
& \hspace{0.5in} +
\text{E}\left[\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] -
g(\boldsymbol{X}_t)\big)^2\right]
\end{align*}\end{split}\]
- Clearly the the \(\smash{MSE}\) is minimized when
\[\smash{\text{E}\left[\big(\text{E}[Y_{t+1}|\boldsymbol{X}_t] -
g(\boldsymbol{X}_t)\big)^2\right] = 0.}\]
- This occurs when \(\smash{\text{E}[Y_{t+1}|\boldsymbol{X}_t] =
g(\boldsymbol{X}_t)}\).
Linear Projection
We can restrict our forecast to be a linear function of
\(\smash{\boldsymbol{X}_t}\):
\[\begin{align*}
Y^*_{t+1|t} & = \boldsymbol{X}'_t \boldsymbol{\beta}.
\end{align*}\]
- Let \(\smash{\boldsymbol{\beta}^*}\) be the value of
\(\smash{\boldsymbol{\beta}}\) so that the forecast error is
orthogonal to or uncorrelated with
\(\smash{\boldsymbol{X}_t}\):
\[\begin{align*}
\text{E}\Big[\boldsymbol{X}_t \underbrace{\left(Y_{t+1} - \boldsymbol{X}'_t
\boldsymbol{\beta}^*\right)}_{\text{forecast error}} \Big] & =
\boldsymbol{0}.
\end{align*}\]
- This is a system of equations.
- \(\smash{\boldsymbol{\beta}^*}\) minimizes the
\(\smash{MSE}\).
Linear Projection
We can use the same steps as before to show that
\(\smash{\boldsymbol{\beta}^*}\) minimizes \(\smash{MSE}\).
- Begin with an arbitrary linear forecasting rule,
\(\smash{Y^*_{t+1|t} = \boldsymbol{X}'_t \boldsymbol{\gamma}}\).
\[\begin{split}\begin{align*}
MSE\left( Y^*_{t+1|t} \right) & =
\text{E}\left[\left(Y_{t+1} - \boldsymbol{X}'_t \boldsymbol{\gamma}
\right)^2\right] \\
& = \text{E}\left[\left(Y_{t+1} - \boldsymbol{X}'_t
\boldsymbol{\beta}^* + \boldsymbol{X}'_t \boldsymbol{\beta}^* -
\boldsymbol{X}'_t \boldsymbol{\gamma} \right)^2\right] \\
& = \text{E}\Big[\big(Y_{t+1} - \boldsymbol{X}'_t
\boldsymbol{\beta}^*\big)^2\Big] + \text{E}\left[\left(\boldsymbol{X}'_t
\boldsymbol{\beta}^* - \boldsymbol{X}'_t
\boldsymbol{\gamma}\right)^2\right].
\end{align*}\end{split}\]
- Hence, \(\smash{MSE}\) is minimized when
\(\smash{\boldsymbol{\gamma} = \boldsymbol{\beta}^*}\).
Linear Projection
\(\smash{Y^*_{t+1|t} = \boldsymbol{X}'_t \boldsymbol{\beta}^*}\)
is referred to as the linear projection of \(\smash{Y_{t+1}}\)
on \(\smash{\boldsymbol{X}_t}\).
- We will denote the linear projection as
\[\begin{equation*}
\hat{P}(Y_{t+1}|\boldsymbol{X}_t) = \boldsymbol{X}'_t \boldsymbol{\beta}^*
\,\,\,\,\,\,\, \text{or} \,\,\,\,\,\,\, \hat{Y}_{t+1|t} =
\boldsymbol{X}'_t \boldsymbol{\beta}^*.
\end{equation*}\]
\[\begin{equation*}
MSE\left(\hat{P}(Y_{t+1}|\boldsymbol{X}_t)\right) \geq
MSE\left(\text{E}[Y_{t+1}|\boldsymbol{X}_t]\right).
\end{equation*}\]
Linear Projection Solution
Using the orthogonality condition:
\[\begin{align}
\boldsymbol{\beta}^* & = \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1}
\text{E}\left[\boldsymbol{X}_t Y_{t+1}\right].
\end{align}\]
- Least squares projection is the sample analogue of the equation
above.
Linear Projection MSE
Using our solution for \(\smash{\boldsymbol{\beta}^*}\), we can
solve for the MSE of the linear projection:
\[\begin{split}\begin{align*}
MSE(Y^*_{t+1|t}) & = \text{E}\left[\left(Y_{t+1}- \boldsymbol{X}'_t
\boldsymbol{\beta}^*\right)^2\right] \\
& = \text{E}[Y^2_{t+1}] - 2\text{E}[Y_{t+1} \boldsymbol{X}'_t \boldsymbol{\beta}^*] +
\text{E}[\boldsymbol{\beta}^{*'} \boldsymbol{X}_t \boldsymbol{X}'_t \boldsymbol{\beta}^*] \\
& = \text{E}[Y^2_{t+1}] - 2\text{E}[Y_{t+1} \boldsymbol{X}'_t] \text{E}\left[\boldsymbol{X}_t
\boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t Y_{t+1}\right] \\
& \hspace{0.85in} +
\text{E}\left[Y_{t+1} \boldsymbol{X}'_t\right] \text{E}\left[\boldsymbol{X}_t
\boldsymbol{X}'_t\right]^{-1} \text{E}[\boldsymbol{X}_t \boldsymbol{X}'_t] \\
& \hspace{1.6in} \times
\text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t
Y_{t+1}\right] \\
& = \text{E}[Y^2_{t+1}] - \text{E}[Y_{t+1} \boldsymbol{X}'_t] \text{E}\left[\boldsymbol{X}_t
\boldsymbol{X}'_t\right]^{-1} \text{E}\left[\boldsymbol{X}_t Y_{t+1}\right].
\end{align*}\end{split}\]
Vector Linear Projection
Let \(\smash{\boldsymbol{Y}_{t+1}}\) be an \(\smash{(n \times
1)}\) vector and \(\smash{\boldsymbol{X}_t}\) an \(\smash{(m
\times 1)}\) vector.
- The linear projection of \(\smash{\boldsymbol{Y}_{t+1}}\) on
\(\smash{\boldsymbol{X}_t}\) is
\[\begin{align*}
\hat{P}(\boldsymbol{Y}'_{t+1}|\boldsymbol{X}_t) = \hat{\boldsymbol{Y}}'_{t+1|t}
= \boldsymbol{X}'_t \boldsymbol{\beta}^*.
\end{align*}\]
where \(\smash{\boldsymbol{\beta}^*}\) is the \(\smash{(m \times n)}\) matrix such that
\[\begin{align*}
\text{E}\Big[\boldsymbol{X}_t \left(\boldsymbol{Y}'_{t+1} - \boldsymbol{X}'_t
\boldsymbol{\beta}^*\right)\Big] & = \boldsymbol{0}.
\end{align*}\]
- As in the univariate case
\[\begin{align*}
\boldsymbol{\beta}^* & = \text{E}\left[\boldsymbol{X}_t \boldsymbol{X}'_t\right]^{-1}
\text{E}\left[\boldsymbol{X}_t \boldsymbol{Y}'_{t+1}\right].
\end{align*}\]