============================================================================== Forecasting ARMA Models ============================================================================== Forecasting with Infinite Data ============================================================================== Consider an :math:`\smash{ARMA}` process with :math:`\smash{MA(\infty)}` representation: .. math:: \begin{align*} Y_t - \mu & = \psi(L) \varepsilon_t, \,\,\,\, \varepsilon_t \stackrel{i.i.d.}{\sim} WN(0,\sigma^2) \end{align*} where .. math:: \begin{gather*} \psi(L) = \sum_{j=0}^{\infty} \psi_{j}L^{j} \\ \sum_{j=0}^{\infty}|\psi_{j}| < \infty \\ \psi_{0} = 1. \end{gather*} Forecasting with Infinite Data ============================================================================== Suppose .. raw:: - we observe an infinite history of :math:`\{\smash{\varepsilon_{t}}\}` up to date :math:`\smash{t}`: :math:`\smash{\{\varepsilon_{t},\varepsilon_{t-1},\varepsilon_{t-2},...\}}`. .. raw:: - we know the :math:`\smash{MA}` parameters :math:`\smash{\mu, \sigma, \{\psi_{j}\}_{j=0}^{\infty}}`. .. raw:: Then .. math:: \begin{align*} Y_{t+s} & = \mu + \varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1} + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots \end{align*} Optimal Forecast ============================================================================== The optimal forecast of :math:`\smash{Y_{t+s}}` in terms of MSE is: .. math:: \begin{align*} E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] = \mu + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots \end{align*} .. raw:: Note, this is different from .. math:: \begin{align*} Y_{t} & = \mu + \psi_{0}\varepsilon_{t} + \psi_{1}\varepsilon_{t-1} + \ldots \end{align*} Forecast Error ============================================================================== The forecast error is: .. math:: \begin{align*} Y_{t+s} & - E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] \\ & = \mu + \overbrace{\varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \psi_{2}\varepsilon_{t+s-2} + \ldots } + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t+1} + \ldots \\ & \hspace{2in} - \mu - \psi_{s}\varepsilon_{t} - \psi_{s+1}\varepsilon_{t-1} - \ldots \\ & = \varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1} \end{align*} .. raw:: Since :math:`\smash{E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots]}` is linear in :math:`\smash{\{\varepsilon_{\tau}\}_{\tau=-\infty}^{t}}` it is both the optimal forecast and optimal linear forecast. Forecast as Linear Projection ============================================================================== Hamilton refers to optimal linear forecasts as :math:`\smash{\hat{E}[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots]}`. .. raw:: - In this case .. math:: \begin{gather*} E[Y_{t+s}|\varepsilon_{t},\ldots] = \hat{E}[Y_{t+s}|\varepsilon_{t},\ldots] \\ \implies Y_{t+s|t}^{*} = \hat{Y}_{t+s|t} \end{gather*} which is also a linear projection :math:`\smash{\hat{p}(Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots)}`. .. raw:: - Clearly, the linear projection condition is satisfied for :math:`\smash{j = t, t-1, \ldots}` .. math:: \begin{align*} E[(Y_{t+s} & - E[Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots])\varepsilon_{j}] \\ & \hspace{1in} = E[(\varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1})\varepsilon_{j}] = 0. \end{align*} Forecast MSE ============================================================================== The forecast MSE is: .. math:: \begin{align*} E[(Y_{t+s} & - E[Y_{t+s}|\varepsilon_{t},\varepsilon_{t-1},\ldots])^{2}] \\ & \hspace{1in} = E[(\varepsilon_{t+s} + \psi_{1}\varepsilon_{t+s-1} + \ldots + \psi_{s-1}\varepsilon_{t+1})^{2}] \\ & \hspace{1in} = \sigma^{2}\sum_{j=0}^{s-1}\psi_{j}^{2}. \end{align*} Forecasting Conditional on Lagged :math:`\smash{Y_t}` ============================================================================== Suppose we don't observe the full history of :math:`\smash{\varepsilon_{t}}`. .. raw:: - Instead, we observe the full history of :math:`\smash{y_{t}: y_{t},y_{t-1},y_{t-2},\ldots}`. .. raw:: - We have an :math:`\smash{ARMA}` process with the same :math:`\smash{MA}` representation as before. .. raw:: If the :math:`\smash{MA(\infty)}` representation is invertible, we can write it as an :math:`\smash{AR(\infty)}`: .. math:: \begin{align*} \eta(L)(Y_{t}-\mu) = \varepsilon_{t}, \end{align*} where :math:`\smash{\eta(L) = \psi^{-1}(L)}`. Computing Historical Values ============================================================================== The history of :math:`\smash{\varepsilon_{t}}` can be constructed with the history of :math:`\smash{y_{t}}`: .. math:: \begin{align*} \varepsilon_{t} & = \eta(L)(y_{t}-\mu) \\ \varepsilon_{t-1} & = \eta(L)(y_{t-1}-\mu) \\ \varepsilon_{t-2} & = \eta(L)(y_{t-2}-\mu) \\ & \vdots \end{align*} .. raw:: .. math:: \begin{align*} \implies E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] & = E[Y_{t+s}|y_{t},y_{t-1},\ldots] \\ & = \mu + (\psi_{s} + \psi_{s+1}L + \psi_{s+2}L^{2}+\ldots)\varepsilon_{t} \\ & = \mu + (\psi_{s} + \psi_{s+1}L + \psi_{s+2}L^{2}+\ldots)\eta(L)(y_{t}-\mu). \end{align*} Example: :math:`\smash{AR(1)}` ============================================================================== For an :math:`\smash{AR(1)}` with :math:`\smash{|\phi| < 1}`: .. math:: \begin{align*} Y_{t} - \mu & = \psi(L)\varepsilon_{t}, \end{align*} where .. math:: \begin{align*} \psi(L) & = (1 + \phi L + \phi^{2}L^{2}+ \ldots) = (1 + \psi_{1} L + \psi_{2} L^2 + \ldots). \end{align*} Example: :math:`\smash{AR(1)}` ============================================================================== The optimal forecast :math:`\smash{s}` -periods ahead is .. math:: \begin{align*} E[Y_{t+s} | \varepsilon_{t},\varepsilon_{t-1},\ldots] & = \mu + \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \ldots \\ & = \mu + \phi^{s}\varepsilon_{t} + \phi^{s+1}\varepsilon_{t-1} + \phi^{s+2}\varepsilon_{t-2} + \ldots \\ & = \mu + \phi^{s}(\varepsilon_{t} + \phi\varepsilon_{t-1} + \phi^{2}\varepsilon_{t-2}+...) \\ & = \mu + \phi^{s}(y_{t} - \mu) \end{align*} .. raw:: - The forecast decays toward :math:`\smash{\mu}` as :math:`\smash{s}` increases. .. raw:: - The MSE is :math:`\smash{\sigma^{2}\sum_{j=0}^{s-1}\phi^{2j}}`. .. raw:: - As :math:`\smash{s\rightarrow \infty, MSE \rightarrow \frac{\sigma^{2}}{1-\phi^{2}} = Var(Y_{t})}`. Forecasting with Finite Data ============================================================================== In reality, we don't observe an infinite history of :math:`\smash{y_{t},y_{t-1},y_{t-2},\ldots}`. .. raw:: - Suppose we have only a finite set of :math:`\smash{m}` past observations of :math:`\smash{y_{t}: y_{t},y_{t-1},\ldots,y_{t-m+1}}`. .. raw:: - The optimal :math:`\smash{AR(p)}` forecast only makes use of the past :math:`\smash{p}` observations if available (i.e. :math:`\smash{p - If we want to forecast an :math:`\smash{MA}` or :math:`\smash{ARMA}` (of arbitrary order), we need an infinite history to construct an optimal forecast. Approximate Optimal Forecasts ============================================================================== Start by setting all :math:`\smash{\varepsilon}` 's prior to time :math:`\smash{t-m+1}` equal to zero. .. math:: \begin{align*} E[Y_{t+s}|y_{t},y_{t-1},\ldots] \approx E[Y_{t+s}|y_{t},y_{t-1},\ldots,y_{t-m+1},\varepsilon_{t-m} = 0, \varepsilon_{t-m-1} = 0, \ldots]. \end{align*} Example :math:`\smash{MA(q)}` ============================================================================== Start with .. math:: \begin{gather*} \hat{\varepsilon}_{t-m} = \hat{\varepsilon}_{t-m-1} = \ldots = \hat{\varepsilon}_{t-m-q+1} = 0. \end{gather*} .. raw:: Calculate forward recursively .. math:: \begin{align*} \hat{\varepsilon}_{t-m+1} & = (y_{t-m+1} - \mu) \\ \hat{\varepsilon}_{t-m+2} & = (y_{t-m+2} - \mu) - \theta_{1} \hat{\varepsilon}_{t-m+1} \\ \hat{\varepsilon}_{t-m+3} & = (y_{t-m+3} - \mu) - \theta_{1} \hat{\varepsilon}_{t-m+2} - \theta_{2}\hat{\varepsilon}_{t-m+1} \\ & \vdots \end{align*} Example :math:`\smash{MA(q)}` ============================================================================== With :math:`\smash{\hat{\varepsilon}_{t},\hat{\varepsilon}_{t-1},\ldots,\hat{\varepsilon}_{t-m+1}}` in hand we can compute forecasts .. math:: \begin{align*} \hat{Y}_{t+s} & = \theta_{s}\hat{\varepsilon}_{t} + \theta_{s+1}\hat{\varepsilon}_{t-1} + \ldots + \theta_{q}\hat{\varepsilon}_{t-q+s}. \end{align*} Exact Finite Sample Forecasts ============================================================================== Another forecast approximation method is to simply project :math:`\smash{Y_{t+1} - \mu}` on :math:`\smash{\boldsymbol{X}_{t} = (Y_{t} -\mu, Y_{t-1}-\mu, \ldots, Y_{t-m+1} - \mu)^T}`. .. raw:: That is .. math:: \begin{align*} \hat{Y}_{t+1|t}^{(m)} - \mu & = \boldsymbol{X}_{t}^{'}\boldsymbol{\beta}^{(m)} \\ & = \beta_{1}^{(m)}(Y_{t}-\mu) + \beta_{2}^{(m)}(Y_{t-1}-\mu) + \ldots + \beta_{m}^{(m)}(Y_{t-m+1}-\mu). \end{align*} Exact Finite Sample Forecasts ============================================================================== .. math:: \begin{align*} \boldsymbol{\beta}^{(m)} & = E[\boldsymbol{X}_{t}\boldsymbol{X}_{t}^{'}]^{-1}E[\boldsymbol{X}_{t}(Y_{t+1}-\mu)] = \left[ \begin{array}{ccccc} \gamma_{0} & \gamma_{1} & \gamma_{2} & \ldots & \gamma_{m-1} \\ \gamma_{1} & \gamma_{0} & \gamma_{1} & \ldots & \gamma_{m-2} \\ \gamma_{2} & \gamma_{1} & \gamma_{0} & \ldots & \gamma_{m-3} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \gamma_{m-1} & \ldots & \ldots & \ldots & \gamma_{0} \\ \end{array} \right]^{-1} \left[ \begin{array}{c} \gamma_{1} \\ \gamma_{2} \\ \vdots \\ \gamma_{m} \\ \end{array} \right]. \end{align*} Exact Finite Sample Forecasts ============================================================================== Similarly, .. math:: \begin{align*} Y_{t+s|t}^{(m)} - \mu & = \boldsymbol{X}_{t}^{'}\boldsymbol{\beta}^{(m,s)} \\ & = \beta_{1}^{(m,s)}(Y_{t}-\mu) + \beta_{2}^{(m,s)}(Y_{t-1}-\mu) + \ldots + \beta_{m}^{(m,s)}(Y_{t-m+1}-\mu). \end{align*} Exact Finite Sample Forecasts ============================================================================== .. math:: \begin{align*} \boldsymbol{\beta}^{(m,s)} & = E[\boldsymbol{X}_{t}\boldsymbol{X}_{t}^{'}]^{-1}E[\boldsymbol{X}_{t}(Y_{t+s}-\mu)] \\ & = \left[ \begin{array}{ccccc} \gamma_{0} & \gamma_{1} & \gamma_{2} & \ldots & \gamma_{m-1} \\ \gamma_{1} & \gamma_{0} & \gamma_{1} & \ldots & \gamma_{m-2} \\ \gamma_{2} & \gamma_{1} & \gamma_{0} & \ldots & \gamma_{m-3} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \gamma_{m-1} & \ldots & \ldots & \ldots & \gamma_{0} \\ \end{array} \right]^{-1} \left[ \begin{array}{c} \gamma_{s} \\ \gamma_{s+1} \\ \vdots \\ \gamma_{s+m-1} \\ \end{array} \right]. \end{align*} Example :math:`\smash{AMRA(1,1)}` ============================================================================== Let :math:`\smash{\{Y_t\}}` be an :math:`\smash{ARMA(1,1)}` process with :math:`\smash{|\phi| <1}` and :math:`\smash{|\theta| < 1}` (causal and invertible). Then: .. math:: \begin{align*} (1-\phi L)(Y_{t} - \mu) & = (1 + \theta L)\varepsilon_{t} \\ \implies Y_{t} - \mu & = \psi(L)\varepsilon_{t} \end{align*} where :math:`\smash{\,\,\psi (L) = (1-\phi L)^{-1}(1 + \theta L)}`. .. raw:: - We can also write .. math:: \smash{\varepsilon_{t} = (1+\theta L)^{-1}(1-\phi L)(Y_{t} - \mu) = \psi(L)^{-1}(Y_{t} - \mu)}. Example :math:`\smash{AMRA(1,1)}` ============================================================================== Expanding the :math:`\smash{MA}` representation .. math:: \begin{align*} \psi(L) & = (1+\phi L + \phi^{2}L^{2} + \ldots)(1 +\theta L) \\ & = 1 + (\phi + \theta)L + (\phi^{2} + \phi \theta)L^{2} + (\phi^{3} + \phi^{2}\theta)L^{3} + \ldots \\ & = 1 + \sum_{j=1}^{\infty} (\phi^{j} + \phi^{j-1}\theta)L^{j} \\ \implies \psi_{m} & = \phi^{m} + \phi^{m-1}\theta. \end{align*} Example :math:`\smash{AMRA(1,1)}` ============================================================================== Let's define :math:`\smash{\psi_{s}(L)}` as the polynomial .. math:: \smash{\psi_{s}(L) = \psi_{s} + \psi_{s+1}L + \psi_{s+2}L^{2} + \ldots} .. raw:: This is different from :math:`\smash{\,\,\psi_{s}L^{s} + \psi_{s+1}L^{s+1} + \ldots}` Example :math:`\smash{AMRA(1,1)}` ============================================================================== For the :math:`\smash{ARMA(1,1)}`, .. math:: \begin{align*} \psi_{s}(L) & = (\phi^{s} + \phi^{s-1}\theta) + (\phi^{s+1} + \phi^{s}\theta)L + (\phi^{s+2} + \phi^{s+1}\theta)L^{2} + \ldots \\ & = \sum_{j=s}^{\infty} (\phi^{j} + \phi^{j-1}\theta)L^{j-s} \\ & = (\phi^{s} + \phi^{s-1}\theta)\sum_{j=0}^{\infty} \phi^{j}L^{j} \\ & = (\phi^{s} + \phi^{s-1}\theta)(1 - \phi L)^{-1}. \end{align*} Example :math:`\smash{AMRA(1,1)}` ============================================================================== Recall, for an :math:`\smash{MA(\infty)}`, the optimal forecast is .. math:: \begin{align*} \hat{Y}_{t+s|t} - \mu & = E[Y_{t+s} | \varepsilon_{t}, \varepsilon_{t-1}, \ldots] \\ & = \psi_{s}\varepsilon_{t} + \psi_{s+1}\varepsilon_{t-1} + \psi_{s+2}\varepsilon_{t-2} + \ldots = \psi_{s}(L)\varepsilon_{t} \end{align*} .. raw:: So, for the :math:`\smash{ARMA(1,1)}`. .. math:: \begin{align*} \hat{Y}_{t+s|t} - \mu & = (\phi^{s} + \phi^{s-1}\theta)(1-\phi L)^{-1}\varepsilon_{t} \\ & = (\phi^{s} + \phi^{s-1}\theta)(1-\phi L)^{-1} (1-\phi L)(1+ \theta L)^{-1}(Y_{t}-\mu) \\ & = (\phi^{s} + \phi^{s-1}\theta)(1+\theta L)^{-1}(Y_{t} - \mu). \end{align*} Example :math:`\smash{AMRA(1,1)}` ============================================================================== Notice .. math:: \begin{align*} \hat{Y}_{t+s|t} - \mu & = (\phi^{s} + \phi^{s-1}\theta)(1+\theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(\phi^{s-1} + \phi^{s-2}\theta)(1+\theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(\hat{Y}_{t+s-1|t} - \mu), \,\,\,\, \text{ if } s \geq 2, \end{align*} which means the forecast decays toward :math:`\smash{\mu}`. Example :math:`\smash{AMRA(1,1)}` ============================================================================== For :math:`\smash{s = 1}`, .. math:: \begin{align*} \hat{Y}_{t+s|t} - \mu & = (\phi +\theta)(1 + \theta L)^{-1}(Y_{t} - \mu) \\ & = (\phi + \phi \theta L - \phi\theta L + \theta)(1 + \theta L)^{-1}(Y_{t}- \mu) \\ & = [\phi(1+\theta L) + \theta(1 - \phi L)](1+\theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(Y_{t} - \mu) + \theta(1 - \phi L)(1 + \theta L)^{-1}(Y_{t} - \mu) \\ & = \phi(Y_{t} - \mu) + \theta\varepsilon_{t}. \end{align*}