Forecasting ARMA Models

Forecasting with Infinite Data

Consider an \(\smash{ARMA}\) process with \(\smash{MA(\infty)}\) representation:

\[\begin{align*} Y_t - \mu & = \psi(L) \varepsilon_t, \,\,\,\, \varepsilon_t \stackrel{i.i.d.}{\sim} WN(0,\sigma^2) \end{align*}\]

where

\[\begin{split}\begin{gather*} \psi(L) = \sum_{j=0}^{\infty} \psi_{j}L^{j} \\ \sum_{j=0}^{\infty}|\psi_{j}| < \infty \\ \psi_{0} = 1. \end{gather*}\end{split}\]

Forecasting with Infinite Data

Suppose

  • we observe an infinite history of \(\{\smash{\varepsilon_{t}}\}\) up to date \(\smash{t}\): \(\smash{\{\varepsilon_{t},\varepsilon_{t-1},\varepsilon_{t-2},...\}}\).
  • we know the \(\smash{MA}\) parameters \(\smash{\mu, \sigma, \{\psi_{j}\}_{j=0}^{\infty}}\).

Then

\[\begin{align*} Y_{t+s} & = \mu + \varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1} + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots \end{align*}\]

Optimal Forecast

The optimal forecast of \(\smash{Y_{t+s}}\) in terms of MSE is:

\[\begin{align*} E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] = \mu + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots \end{align*}\]

Note, this is different from

\[\begin{align*} Y_{t} & = \mu + \psi_{0}\varepsilon_{t} + \psi_{1}\varepsilon_{t-1} + \ldots \end{align*}\]

Forecast Error

The forecast error is:

\[\begin{split}\begin{align*} Y_{t+s} & - E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] \\ & = \mu + \overbrace{\varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \psi_{2}\varepsilon_{t+s-2} + \ldots } + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t+1} + \ldots \\ & \hspace{2in} - \mu - \psi_{s}\varepsilon_{t} - \psi_{s+1}\varepsilon_{t-1} - \ldots \\ & = \varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1} \end{align*}\end{split}\]

Since \(\smash{E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots]}\) is linear in \(\smash{\{\varepsilon_{\tau}\}_{\tau=-\infty}^{t}}\) it is both the optimal forecast and optimal linear forecast.

Forecast as Linear Projection

Hamilton refers to optimal linear forecasts as \(\smash{\hat{E}[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots]}\).

  • In this case
\[\begin{split}\begin{gather*} E[Y_{t+s}|\varepsilon_{t},\ldots] = \hat{E}[Y_{t+s}|\varepsilon_{t},\ldots] \\ \implies Y_{t+s|t}^{*} = \hat{Y}_{t+s|t} \end{gather*}\end{split}\]

which is also a linear projection \(\smash{\hat{p}(Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots)}\).

  • Clearly, the linear projection condition is satisfied for \(\smash{j = t, t-1, \ldots}\)
\[\begin{split}\begin{align*} E[(Y_{t+s} & - E[Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots])\varepsilon_{j}] \\ & \hspace{1in} = E[(\varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1})\varepsilon_{j}] = 0. \end{align*}\end{split}\]

Forecast MSE

The forecast MSE is:

\[\begin{split}\begin{align*} E[(Y_{t+s} & - E[Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots])^{2}] \\ & \hspace{1in} = E[(\varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1})^{2}] \\ & \hspace{1in} = \sigma^{2}\sum_{j=0}^{s-1}\psi_{j}^{2}. \end{align*}\end{split}\]

Forecasting Conditional on Lagged \(\smash{Y_t}\)

Suppose we don’t observe the full history of \(\smash{\varepsilon_{t}}\).

  • Instead, we observe the full history of \(\smash{y_{t}: y_{t},y_{t-1},y_{t-2},\ldots}\).
  • We have an \(\smash{ARMA}\) process with the same \(\smash{MA}\) representation as before.

If the \(\smash{MA(\infty)}\) representation is invertible, we can write it as an \(\smash{AR(\infty)}\):

\[\begin{align*} \eta(L)(Y_{t}-\mu) = \varepsilon_{t}, \end{align*}\]

where \(\smash{\eta(L) = \psi^{-1}(L)}\).

Computing Historical Values

The history of \(\smash{\varepsilon_{t}}\) can be constructed with the history of \(\smash{y_{t}}\):

\[\begin{split}\begin{align*} \varepsilon_{t} & = \eta(L)(y_{t}-\mu) \\ \varepsilon_{t-1} & = \eta(L)(y_{t-1}-\mu) \\ \varepsilon_{t-2} & = \eta(L)(y_{t-2}-\mu) \\ & \vdots \end{align*}\end{split}\]
\[\begin{split}\begin{align*} \implies E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] & = E[Y_{t+s}|y_{t},y_{t-1},\ldots] \\ & = \mu + (\psi_{s} + \psi_{s+1}L + \psi_{s+2}L^{2}+\ldots)\varepsilon_{t} \\ & = \mu + (\psi_{s} + \psi_{s+1}L + \psi_{s+2}L^{2}+\ldots)\eta(L)(y_{t}-\mu). \end{align*}\end{split}\]

Example: \(\smash{AR(1)}\)

For an \(\smash{AR(1)}\) with \(\smash{|\phi| < 1}\):

\[\begin{align*} Y_{t} - \mu & = \psi(L)\varepsilon_{t}, \end{align*}\]

where

\[\begin{align*} \psi(L) & = (1 + \phi L + \phi^{2}L^{2}+ \ldots) = (1 + \psi_{1} L + \psi_{2} L^2 + \ldots). \end{align*}\]

Example: \(\smash{AR(1)}\)

The optimal forecast \(\smash{s}\) -periods ahead is

\[\begin{split}\begin{align*} E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] & = \mu + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots \\ & = \mu + \phi^{s}\varepsilon_{t} + \phi^{s+1}\varepsilon_{t-1} + \phi^{s+2}\varepsilon_{t-2} + \ldots \\ & = \mu + \phi^{s}(\varepsilon_{t} + \phi\varepsilon_{t-1} + \phi^{2}\varepsilon_{t-2}+...) \\ & = \mu + \phi^{s}(y_{t} - \mu) \end{align*}\end{split}\]
  • The forecast decays toward \(\smash{\mu}\) as \(\smash{s}\) increases.
  • The MSE is \(\smash{\sigma^{2}\sum_{j=0}^{s-1}\phi^{2j}}\).
  • As \(\smash{s\rightarrow \infty, MSE \rightarrow \frac{\sigma^{2}}{1-\phi^{2}} = Var(Y_{t})}\).

Forecasting with Finite Data

In reality, we don’t observe an infinite history of \(\smash{y_{t},y_{t-1},y_{t-2},\ldots}\).

  • Suppose we have only a finite set of \(\smash{m}\) past observations of \(\smash{y_{t}: y_{t},y_{t-1},\ldots,y_{t-m+1}}\).
  • The optimal \(\smash{AR(p)}\) forecast only makes use of the past \(\smash{p}\) observations if available (i.e. \(\smash{p<m}\)).
  • If we want to forecast an \(\smash{MA}\) or \(\smash{ARMA}\) (of arbitrary order), we need an infinite history to construct an optimal forecast.

Approximate Optimal Forecasts

Start by setting all \(\smash{\varepsilon}\) ‘s prior to time \(\smash{t-m+1}\) equal to zero.

\[\begin{align*} E[Y_{t+s}|y_{t},y_{t-1},\ldots] \approx E[Y_{t+s}|y_{t},y_{t-1},\ldots,y_{t-m+1},\varepsilon_{t-m} = 0, \varepsilon_{t-m-1} = 0, \ldots]. \end{align*}\]

Example \(\smash{MA(q)}\)

Start with

\[\begin{gather*} \hat{\varepsilon}_{t-m} = \hat{\varepsilon}_{t-m-1} = \ldots = \hat{\varepsilon}_{t-m-q+1} = 0. \end{gather*}\]

Calculate forward recursively

\[\begin{split}\begin{align*} \hat{\varepsilon}_{t-m+1} & = (y_{t-m+1} - \mu) \\ \hat{\varepsilon}_{t-m+2} & = (y_{t-m+2} - \mu) - \theta_{1} \hat{\varepsilon}_{t-m+1} \\ \hat{\varepsilon}_{t-m+3} & = (y_{t-m+3} - \mu) - \theta_{1} \hat{\varepsilon}_{t-m+2} - \theta_{2}\hat{\varepsilon}_{t-m+1} \\ & \vdots \end{align*}\end{split}\]

Example \(\smash{MA(q)}\)

With \(\smash{\hat{\varepsilon}_{t},\hat{\varepsilon}_{t-1},\ldots,\hat{\varepsilon}_{t-m+1}}\) in hand we can compute forecasts

\[\begin{align*} \hat{Y}_{t+s} & = \theta_{s}\hat{\varepsilon}_{t} + \theta_{s+1}\hat{\varepsilon}_{t-1} + \ldots + \theta_{q}\hat{\varepsilon}_{t-q+s}. \end{align*}\]

Exact Finite Sample Forecasts

Another forecast approximation method is to simply project \(\smash{Y_{t+1} - \mu}\) on \(\smash{\boldsymbol{X}_{t} = (Y_{t} -\mu, Y_{t-1}-\mu, \ldots, Y_{t-m+1} - \mu)^T}\).

That is

\[\begin{split}\begin{align*} \hat{Y}_{t+1|t}^{(m)} - \mu & = \boldsymbol{X}_{t}^{'}\boldsymbol{\beta}^{(m)} \\ & = \beta_{1}^{(m)}(Y_{t}-\mu) + \beta_{2}^{(m)}(Y_{t-1}-\mu) + \ldots + \beta_{m}^{(m)}(Y_{t-m+1}-\mu). \end{align*}\end{split}\]

Exact Finite Sample Forecasts

\[\begin{split}\begin{align*} \boldsymbol{\beta}^{(m)} & = E[\boldsymbol{X}_{t}\boldsymbol{X}_{t}^{'}]^{-1}E[\boldsymbol{X}_{t}(Y_{t+1}-\mu)] = \left[ \begin{array}{ccccc} \gamma_{0} & \gamma_{1} & \gamma_{2} & \ldots & \gamma_{m-1} \\ \gamma_{1} & \gamma_{0} & \gamma_{1} & \ldots & \gamma_{m-2} \\ \gamma_{2} & \gamma_{1} & \gamma_{0} & \ldots & \gamma_{m-3} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \gamma_{m-1} & \ldots & \ldots & \ldots & \gamma_{0} \\ \end{array} \right]^{-1} \left[ \begin{array}{c} \gamma_{1} \\ \gamma_{2} \\ \vdots \\ \gamma_{m} \\ \end{array} \right]. \end{align*}\end{split}\]

Exact Finite Sample Forecasts

Similarly,

\[\begin{split}\begin{align*} Y_{t+s|t}^{(m)} - \mu & = \boldsymbol{X}_{t}^{'}\boldsymbol{\beta}^{(m,s)} \\ & = \beta_{1}^{(m,s)}(Y_{t}-\mu) + \beta_{2}^{(m,s)}(Y_{t-1}-\mu) + \ldots + \beta_{m}^{(m,s)}(Y_{t-m+1}-\mu). \end{align*}\end{split}\]

Exact Finite Sample Forecasts

\[\begin{split}\begin{align*} \boldsymbol{\beta}^{(m,s)} & = E[\boldsymbol{X}_{t}\boldsymbol{X}_{t}^{'}]^{-1}E[\boldsymbol{X}_{t}(Y_{t+s}-\mu)] \\ & = \left[ \begin{array}{ccccc} \gamma_{0} & \gamma_{1} & \gamma_{2} & \ldots & \gamma_{m-1} \\ \gamma_{1} & \gamma_{0} & \gamma_{1} & \ldots & \gamma_{m-2} \\ \gamma_{2} & \gamma_{1} & \gamma_{0} & \ldots & \gamma_{m-3} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \gamma_{m-1} & \ldots & \ldots & \ldots & \gamma_{0} \\ \end{array} \right]^{-1} \left[ \begin{array}{c} \gamma_{s} \\ \gamma_{s+1} \\ \vdots \\ \gamma_{s+m-1} \\ \end{array} \right]. \end{align*}\end{split}\]

Example \(\smash{AMRA(1,1)}\)

Let \(\smash{\{Y_t\}}\) be an \(\smash{ARMA(1,1)}\) process with \(\smash{|\phi| <1}\) and \(\smash{|\theta| < 1}\) (causal and invertible). Then:

\[\begin{split}\begin{align*} (1-\phi L)(Y_{t} - \mu) & = (1 + \theta L)\varepsilon_{t} \\ \implies Y_{t} - \mu & = \psi(L)\varepsilon_{t} \end{align*}\end{split}\]

where \(\smash{\,\,\psi (L) = (1-\phi L)^{-1}(1 + \theta L)}\).

  • We can also write
\[\smash{\varepsilon_{t} = (1+\theta L)^{-1}(1-\phi L)(Y_{t} - \mu) = \psi(L)^{-1}(Y_{t} - \mu)}.\]

Example \(\smash{AMRA(1,1)}\)

Expanding the \(\smash{MA}\) representation

\[\begin{split}\begin{align*} \psi(L) & = (1+\phi L + \phi^{2}L^{2} + \ldots)(1 +\theta L) \\ & = 1 + (\phi + \theta)L + (\phi^{2} + \phi \theta)L^{2} + (\phi^{3} + \phi^{2}\theta)L^{3} + \ldots \\ & = 1 + \sum_{j=1}^{\infty} (\phi^{j} + \phi^{j-1}\theta)L^{j} \\ \implies \psi_{m} & = \phi^{m} + \phi^{m-1}\theta. \end{align*}\end{split}\]

Example \(\smash{AMRA(1,1)}\)

Let’s define \(\smash{\psi_{s}(L)}\) as the polynomial

\[\smash{\psi_{s}(L) = \psi_{s} + \psi_{s+1}L + \psi_{s+2}L^{2} + \ldots}\]

This is different from \(\smash{\,\,\psi_{s}L^{s} + \psi_{s+1}L^{s+1} + \ldots}\)

Example \(\smash{AMRA(1,1)}\)

For the \(\smash{ARMA(1,1)}\),

\[\begin{split}\begin{align*} \psi_{s}(L) & = (\phi^{s} + \phi^{s-1}\theta) + (\phi^{s+1} + \phi^{s}\theta)L + (\phi^{s+2} + \phi^{s+1}\theta)L^{2} + \ldots \\ & = \sum_{j=s}^{\infty} (\phi^{j} + \phi^{j-1}\theta)L^{j-s} \\ & = (\phi^{s} + \phi^{s-1}\theta)\sum_{j=0}^{\infty} \phi^{j}L^{j} \\ & = (\phi^{s} + \phi^{s-1}\theta)(1 - \phi L)^{-1}. \end{align*}\end{split}\]

Example \(\smash{AMRA(1,1)}\)

Recall, for an \(\smash{MA(\infty)}\), the optimal forecast is

\[\begin{split}\begin{align*} \hat{Y}_{t+s|t} - \mu & = E[Y_{t+s} | \varepsilon_{t}, \varepsilon_{t-1}, \ldots] \\ & = \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \psi_{s+2}\varepsilon_{t-2} + \ldots = \psi_{s}(L)\varepsilon_{t} \end{align*}\end{split}\]

So, for the \(\smash{ARMA(1,1)}\).

\[\begin{split}\begin{align*} \hat{Y}_{t+s|t} - \mu & = (\phi^{s} + \phi^{s-1}\theta)(1-\phi L)^{-1}\varepsilon_{t} \\ & = (\phi^{s} + \phi^{s-1}\theta)(1-\phi L)^{-1} (1-\phi L)(1+ \theta L)^{-1}(Y_{t}-\mu) \\ & = (\phi^{s} + \phi^{s-1}\theta)(1+\theta L)^{-1}(Y_{t} - \mu). \end{align*}\end{split}\]

Example \(\smash{AMRA(1,1)}\)

Notice

\[\begin{split}\begin{align*} \hat{Y}_{t+s|t} - \mu & = (\phi^{s} + \phi^{s-1}\theta)(1+\theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(\phi^{s-1} + \phi^{s-2}\theta)(1+\theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(\hat{Y}_{t+s-1|t} - \mu), \,\,\,\, \text{ if } s \geq 2, \end{align*}\end{split}\]

which means the forecast decays toward \(\smash{\mu}\).

Example \(\smash{AMRA(1,1)}\)

For \(\smash{s = 1}\),

\[\begin{split}\begin{align*} \hat{Y}_{t+s|t} - \mu & = (\phi +\theta)(1 + \theta L)^{-1}(Y_{t} - \mu) \\ & = (\phi + \phi \theta L - \phi\theta L + \theta)(1 + \theta L)^{-1}(Y_{t}- \mu) \\ & = [\phi(1+\theta L) + \theta(1 - \phi L)](1+\theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(Y_{t} - \mu) + \theta(1 - \phi L)(1 + \theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(Y_{t} - \mu) + \theta\varepsilon_{t}. \end{align*}\end{split}\]