Forecasting ARMA Models
Forecasting with Infinite Data
Consider an \(\smash{ARMA}\) process with
\(\smash{MA(\infty)}\) representation:
\[\begin{align*}
Y_t - \mu & = \psi(L) \varepsilon_t, \,\,\,\, \varepsilon_t
\stackrel{i.i.d.}{\sim}
WN(0,\sigma^2)
\end{align*}\]
where
\[\begin{split}\begin{gather*}
\psi(L) = \sum_{j=0}^{\infty} \psi_{j}L^{j} \\
\sum_{j=0}^{\infty}|\psi_{j}| < \infty \\
\psi_{0} = 1.
\end{gather*}\end{split}\]
Forecasting with Infinite Data
Suppose
- we observe an infinite history of
\(\{\smash{\varepsilon_{t}}\}\) up to date \(\smash{t}\):
\(\smash{\{\varepsilon_{t},\varepsilon_{t-1},\varepsilon_{t-2},...\}}\).
- we know the \(\smash{MA}\) parameters
\(\smash{\mu, \sigma, \{\psi_{j}\}_{j=0}^{\infty}}\).
Then
\[\begin{align*}
Y_{t+s} & = \mu + \varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} +
\ldots + \psi_{s-1}\varepsilon_{t+1} + \psi_{s}\varepsilon_{t} +
\psi_{s+1}\varepsilon_{t-1} + \ldots
\end{align*}\]
Optimal Forecast
The optimal forecast of \(\smash{Y_{t+s}}\) in terms of MSE is:
\[\begin{align*}
E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] = \mu +
\psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots
\end{align*}\]
Note, this is different from
\[\begin{align*}
Y_{t} & = \mu + \psi_{0}\varepsilon_{t} +
\psi_{1}\varepsilon_{t-1} + \ldots
\end{align*}\]
Forecast Error
The forecast error is:
\[\begin{split}\begin{align*}
Y_{t+s} & - E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] \\
& = \mu + \overbrace{\varepsilon_{t+s} +
\psi_{1}\varepsilon_{t+s-1} + \psi_{2}\varepsilon_{t+s-2} + \ldots
} + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t+1} +
\ldots \\
& \hspace{2in} - \mu - \psi_{s}\varepsilon_{t} -
\psi_{s+1}\varepsilon_{t-1} - \ldots \\
& = \varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots +
\psi_{s-1}\varepsilon_{t+1}
\end{align*}\end{split}\]
Since \(\smash{E[Y_{t+s} |
\varepsilon_{t},\varepsilon_{t-1},\ldots]}\) is linear in
\(\smash{\{\varepsilon_{\tau}\}_{\tau=-\infty}^{t}}\)
it is both the optimal forecast and optimal linear forecast.
Forecast as Linear Projection
Hamilton refers to optimal linear forecasts as
\(\smash{\hat{E}[Y_{t+s} |
\varepsilon_{t},\varepsilon_{t-1},\ldots]}\).
\[\begin{split}\begin{gather*}
E[Y_{t+s}|\varepsilon_{t},\ldots] =
\hat{E}[Y_{t+s}|\varepsilon_{t},\ldots] \\
\implies Y_{t+s|t}^{*} = \hat{Y}_{t+s|t}
\end{gather*}\end{split}\]
which is also a linear projection
\(\smash{\hat{p}(Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots)}\).
- Clearly, the linear projection condition is satisfied for
\(\smash{j = t, t-1, \ldots}\)
\[\begin{split}\begin{align*}
E[(Y_{t+s} & -
E[Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots])\varepsilon_{j}]
\\
& \hspace{1in} = E[(\varepsilon_{t+s} +
\psi_{1}\varepsilon_{t+s-1} + \ldots +
\psi_{s-1}\varepsilon_{t+1})\varepsilon_{j}] = 0.
\end{align*}\end{split}\]
Forecast MSE
The forecast MSE is:
\[\begin{split}\begin{align*}
E[(Y_{t+s} & -
E[Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots])^{2}] \\
& \hspace{1in} =
E[(\varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots +
\psi_{s-1}\varepsilon_{t+1})^{2}] \\
& \hspace{1in} =
\sigma^{2}\sum_{j=0}^{s-1}\psi_{j}^{2}.
\end{align*}\end{split}\]
Forecasting Conditional on Lagged \(\smash{Y_t}\)
Suppose we don’t observe the full history of
\(\smash{\varepsilon_{t}}\).
- Instead, we observe the full history of \(\smash{y_{t}:
y_{t},y_{t-1},y_{t-2},\ldots}\).
- We have an \(\smash{ARMA}\) process with the same
\(\smash{MA}\) representation as before.
If the \(\smash{MA(\infty)}\) representation is invertible, we can
write it as an \(\smash{AR(\infty)}\):
\[\begin{align*}
\eta(L)(Y_{t}-\mu) = \varepsilon_{t},
\end{align*}\]
where \(\smash{\eta(L) = \psi^{-1}(L)}\).
Computing Historical Values
The history of \(\smash{\varepsilon_{t}}\) can be constructed with
the history of \(\smash{y_{t}}\):
\[\begin{split}\begin{align*}
\varepsilon_{t} & = \eta(L)(y_{t}-\mu) \\
\varepsilon_{t-1} & = \eta(L)(y_{t-1}-\mu) \\
\varepsilon_{t-2} & = \eta(L)(y_{t-2}-\mu) \\
& \vdots
\end{align*}\end{split}\]
\[\begin{split}\begin{align*}
\implies E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] & =
E[Y_{t+s}|y_{t},y_{t-1},\ldots] \\
& = \mu + (\psi_{s} + \psi_{s+1}L +
\psi_{s+2}L^{2}+\ldots)\varepsilon_{t} \\
& = \mu + (\psi_{s} + \psi_{s+1}L +
\psi_{s+2}L^{2}+\ldots)\eta(L)(y_{t}-\mu).
\end{align*}\end{split}\]
Example: \(\smash{AR(1)}\)
For an \(\smash{AR(1)}\) with \(\smash{|\phi| < 1}\):
\[\begin{align*}
Y_{t} - \mu & = \psi(L)\varepsilon_{t},
\end{align*}\]
where
\[\begin{align*}
\psi(L) & = (1 + \phi L + \phi^{2}L^{2}+ \ldots) = (1 + \psi_{1}
L + \psi_{2} L^2 + \ldots).
\end{align*}\]
Example: \(\smash{AR(1)}\)
The optimal forecast \(\smash{s}\) -periods ahead is
\[\begin{split}\begin{align*}
E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] & = \mu +
\psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots \\
& = \mu + \phi^{s}\varepsilon_{t} + \phi^{s+1}\varepsilon_{t-1} +
\phi^{s+2}\varepsilon_{t-2} + \ldots \\
& = \mu + \phi^{s}(\varepsilon_{t} + \phi\varepsilon_{t-1} +
\phi^{2}\varepsilon_{t-2}+...) \\
& = \mu + \phi^{s}(y_{t} - \mu)
\end{align*}\end{split}\]
- The forecast decays toward \(\smash{\mu}\) as \(\smash{s}\)
increases.
- The MSE is \(\smash{\sigma^{2}\sum_{j=0}^{s-1}\phi^{2j}}\).
- As \(\smash{s\rightarrow \infty, MSE \rightarrow
\frac{\sigma^{2}}{1-\phi^{2}} = Var(Y_{t})}\).
Forecasting with Finite Data
In reality, we don’t observe an infinite history of
\(\smash{y_{t},y_{t-1},y_{t-2},\ldots}\).
- Suppose we have only a finite set of \(\smash{m}\) past
observations of \(\smash{y_{t}:
y_{t},y_{t-1},\ldots,y_{t-m+1}}\).
- The optimal \(\smash{AR(p)}\) forecast only makes use of the
past \(\smash{p}\) observations if available
(i.e. \(\smash{p<m}\)).
- If we want to forecast an \(\smash{MA}\) or \(\smash{ARMA}\)
(of arbitrary order), we need an infinite history to construct an
optimal forecast.
Approximate Optimal Forecasts
Start by setting all \(\smash{\varepsilon}\) ‘s prior to time
\(\smash{t-m+1}\) equal to zero.
\[\begin{align*}
E[Y_{t+s}|y_{t},y_{t-1},\ldots] \approx
E[Y_{t+s}|y_{t},y_{t-1},\ldots,y_{t-m+1},\varepsilon_{t-m} = 0,
\varepsilon_{t-m-1} = 0, \ldots].
\end{align*}\]
Example \(\smash{MA(q)}\)
Start with
\[\begin{gather*}
\hat{\varepsilon}_{t-m} =
\hat{\varepsilon}_{t-m-1} = \ldots = \hat{\varepsilon}_{t-m-q+1} = 0.
\end{gather*}\]
Calculate forward recursively
\[\begin{split}\begin{align*}
\hat{\varepsilon}_{t-m+1} & = (y_{t-m+1} - \mu) \\
\hat{\varepsilon}_{t-m+2} & = (y_{t-m+2} - \mu) - \theta_{1}
\hat{\varepsilon}_{t-m+1} \\
\hat{\varepsilon}_{t-m+3} & = (y_{t-m+3} - \mu) - \theta_{1}
\hat{\varepsilon}_{t-m+2} - \theta_{2}\hat{\varepsilon}_{t-m+1} \\
& \vdots
\end{align*}\end{split}\]
Example \(\smash{MA(q)}\)
With
\(\smash{\hat{\varepsilon}_{t},\hat{\varepsilon}_{t-1},\ldots,\hat{\varepsilon}_{t-m+1}}\)
in hand we can compute forecasts
\[\begin{align*}
\hat{Y}_{t+s} & = \theta_{s}\hat{\varepsilon}_{t} +
\theta_{s+1}\hat{\varepsilon}_{t-1} + \ldots +
\theta_{q}\hat{\varepsilon}_{t-q+s}.
\end{align*}\]
Exact Finite Sample Forecasts
Another forecast approximation method is to simply project
\(\smash{Y_{t+1} - \mu}\) on \(\smash{\boldsymbol{X}_{t} =
(Y_{t} -\mu, Y_{t-1}-\mu, \ldots, Y_{t-m+1} - \mu)^T}\).
That is
\[\begin{split}\begin{align*}
\hat{Y}_{t+1|t}^{(m)} - \mu & =
\boldsymbol{X}_{t}^{'}\boldsymbol{\beta}^{(m)} \\
& = \beta_{1}^{(m)}(Y_{t}-\mu) + \beta_{2}^{(m)}(Y_{t-1}-\mu) +
\ldots + \beta_{m}^{(m)}(Y_{t-m+1}-\mu).
\end{align*}\end{split}\]
Exact Finite Sample Forecasts
\[\begin{split}\begin{align*}
\boldsymbol{\beta}^{(m)} & =
E[\boldsymbol{X}_{t}\boldsymbol{X}_{t}^{'}]^{-1}E[\boldsymbol{X}_{t}(Y_{t+1}-\mu)]
= \left[ \begin{array}{ccccc} \gamma_{0} & \gamma_{1} & \gamma_{2}
& \ldots & \gamma_{m-1} \\ \gamma_{1} & \gamma_{0} & \gamma_{1} &
\ldots & \gamma_{m-2} \\ \gamma_{2} & \gamma_{1} & \gamma_{0} &
\ldots & \gamma_{m-3} \\ \vdots & \vdots & \vdots & \vdots & \vdots
\\ \gamma_{m-1} & \ldots & \ldots & \ldots & \gamma_{0} \\
\end{array} \right]^{-1} \left[ \begin{array}{c} \gamma_{1} \\
\gamma_{2} \\ \vdots \\ \gamma_{m} \\ \end{array} \right].
\end{align*}\end{split}\]
Exact Finite Sample Forecasts
Similarly,
\[\begin{split}\begin{align*}
Y_{t+s|t}^{(m)} - \mu & =
\boldsymbol{X}_{t}^{'}\boldsymbol{\beta}^{(m,s)} \\
& = \beta_{1}^{(m,s)}(Y_{t}-\mu) +
\beta_{2}^{(m,s)}(Y_{t-1}-\mu) + \ldots +
\beta_{m}^{(m,s)}(Y_{t-m+1}-\mu).
\end{align*}\end{split}\]
Exact Finite Sample Forecasts
\[\begin{split}\begin{align*}
\boldsymbol{\beta}^{(m,s)} & =
E[\boldsymbol{X}_{t}\boldsymbol{X}_{t}^{'}]^{-1}E[\boldsymbol{X}_{t}(Y_{t+s}-\mu)]
\\
& = \left[ \begin{array}{ccccc} \gamma_{0} & \gamma_{1} &
\gamma_{2} & \ldots & \gamma_{m-1} \\ \gamma_{1} & \gamma_{0} &
\gamma_{1} & \ldots & \gamma_{m-2} \\ \gamma_{2} & \gamma_{1} &
\gamma_{0} & \ldots & \gamma_{m-3} \\ \vdots & \vdots & \vdots &
\vdots & \vdots \\ \gamma_{m-1} & \ldots & \ldots & \ldots &
\gamma_{0} \\ \end{array} \right]^{-1} \left[ \begin{array}{c}
\gamma_{s} \\ \gamma_{s+1} \\ \vdots \\ \gamma_{s+m-1} \\
\end{array} \right].
\end{align*}\end{split}\]
Example \(\smash{AMRA(1,1)}\)
Let \(\smash{\{Y_t\}}\) be an \(\smash{ARMA(1,1)}\) process
with \(\smash{|\phi| <1}\) and \(\smash{|\theta| < 1}\)
(causal and invertible). Then:
\[\begin{split}\begin{align*}
(1-\phi L)(Y_{t} - \mu) & = (1 + \theta L)\varepsilon_{t} \\
\implies Y_{t} - \mu & = \psi(L)\varepsilon_{t}
\end{align*}\end{split}\]
where \(\smash{\,\,\psi (L) = (1-\phi L)^{-1}(1 + \theta
L)}\).
\[\smash{\varepsilon_{t} = (1+\theta L)^{-1}(1-\phi L)(Y_{t} - \mu) =
\psi(L)^{-1}(Y_{t} - \mu)}.\]
Example \(\smash{AMRA(1,1)}\)
Expanding the \(\smash{MA}\) representation
\[\begin{split}\begin{align*}
\psi(L) & = (1+\phi L + \phi^{2}L^{2} + \ldots)(1 +\theta L) \\
& = 1 + (\phi + \theta)L + (\phi^{2} + \phi \theta)L^{2} + (\phi^{3} +
\phi^{2}\theta)L^{3} + \ldots \\
& = 1 + \sum_{j=1}^{\infty} (\phi^{j} + \phi^{j-1}\theta)L^{j} \\
\implies \psi_{m} & = \phi^{m} + \phi^{m-1}\theta.
\end{align*}\end{split}\]
Example \(\smash{AMRA(1,1)}\)
Let’s define \(\smash{\psi_{s}(L)}\) as the polynomial
\[\smash{\psi_{s}(L) = \psi_{s} + \psi_{s+1}L + \psi_{s+2}L^{2} +
\ldots}\]
This is different from \(\smash{\,\,\psi_{s}L^{s} +
\psi_{s+1}L^{s+1} + \ldots}\)
Example \(\smash{AMRA(1,1)}\)
For the \(\smash{ARMA(1,1)}\),
\[\begin{split}\begin{align*}
\psi_{s}(L) & = (\phi^{s} + \phi^{s-1}\theta) +
(\phi^{s+1} + \phi^{s}\theta)L + (\phi^{s+2} +
\phi^{s+1}\theta)L^{2} + \ldots \\
& = \sum_{j=s}^{\infty} (\phi^{j} + \phi^{j-1}\theta)L^{j-s} \\
& = (\phi^{s} + \phi^{s-1}\theta)\sum_{j=0}^{\infty} \phi^{j}L^{j}
\\
& = (\phi^{s} + \phi^{s-1}\theta)(1 - \phi L)^{-1}.
\end{align*}\end{split}\]
Example \(\smash{AMRA(1,1)}\)
Recall, for an \(\smash{MA(\infty)}\), the optimal forecast is
\[\begin{split}\begin{align*}
\hat{Y}_{t+s|t} - \mu & = E[Y_{t+s} | \varepsilon_{t},
\varepsilon_{t-1}, \ldots] \\
& = \psi_{s}\varepsilon_{t} +
\psi_{s+1}\varepsilon_{t-1} + \psi_{s+2}\varepsilon_{t-2} + \ldots
= \psi_{s}(L)\varepsilon_{t}
\end{align*}\end{split}\]
So, for the \(\smash{ARMA(1,1)}\).
\[\begin{split}\begin{align*}
\hat{Y}_{t+s|t} - \mu & = (\phi^{s} + \phi^{s-1}\theta)(1-\phi
L)^{-1}\varepsilon_{t} \\
& = (\phi^{s} + \phi^{s-1}\theta)(1-\phi L)^{-1} (1-\phi L)(1+
\theta L)^{-1}(Y_{t}-\mu) \\
& = (\phi^{s} + \phi^{s-1}\theta)(1+\theta L)^{-1}(Y_{t} - \mu).
\end{align*}\end{split}\]
Example \(\smash{AMRA(1,1)}\)
Notice
\[\begin{split}\begin{align*}
\hat{Y}_{t+s|t} - \mu & = (\phi^{s} + \phi^{s-1}\theta)(1+\theta
L)^{-1}(Y_{t} - \mu) \\
& = \phi(\phi^{s-1} + \phi^{s-2}\theta)(1+\theta L)^{-1}(Y_{t} -
\mu) \\
& = \phi(\hat{Y}_{t+s-1|t} - \mu), \,\,\,\, \text{ if } s \geq 2,
\end{align*}\end{split}\]
which means the forecast decays toward \(\smash{\mu}\).
Example \(\smash{AMRA(1,1)}\)
For \(\smash{s = 1}\),
\[\begin{split}\begin{align*}
\hat{Y}_{t+s|t} - \mu & = (\phi +\theta)(1 + \theta L)^{-1}(Y_{t} -
\mu) \\
& = (\phi + \phi \theta L - \phi\theta L + \theta)(1 + \theta
L)^{-1}(Y_{t}- \mu) \\
& = [\phi(1+\theta L) + \theta(1 - \phi L)](1+\theta L)^{-1}(Y_{t} -
\mu) \\
& = \phi(Y_{t} - \mu) + \theta(1 - \phi L)(1 + \theta
L)^{-1}(Y_{t} - \mu) \\
& = \phi(Y_{t} - \mu) + \theta\varepsilon_{t}.
\end{align*}\end{split}\]