ARMA Maximum Likelihood Estimation

\(\smash{AR(p)}\) Likelihood

Recall a Gaussian \(AR(p)\) process:

\[\begin{align} Y_t & = c + \phi_1 Y_{t-1} + \ldots + \phi_p Y_{t-p} + \varepsilon_t, \,\,\,\, \varepsilon_t \stackrel{i.i.d.}{\sim} \mathcal{N}(0,\sigma^2). \end{align}\]
  • In this case \(\smash{\boldsymbol{\theta} = (c,\phi_1,\ldots,\phi_p,\sigma^2)}\).
  • We will suppose that \(\smash{\{Y_t\}}\) is stationary and causal.

\(\smash{AR(p)}\) Likelihood

Suppose we know that \(\smash{Y_{t-1} = y_{t-1}, Y_{t-2} = y_{t-2}, \ldots, Y_{t-p} = y_{t-p}}\) for \(\smash{t > p}\). Then

\[\begin{split}\begin{gather} Y_t = c + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} + \varepsilon_t \\ \text{E}[Y_t|Y_{t-1},\ldots,Y_{t-p},\boldsymbol{\theta}] = c + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} \\ \text{Var}(Y_t|Y_{t-1},\ldots,Y_{t-p},\boldsymbol{\theta}) = \sigma^2. \end{gather}\end{split}\]

\(\smash{AR(p)}\) Likelihood

Thus,

\[\begin{align} Y_t|Y_{t-1},\ldots,Y_{t-p} \sim \mathcal{N}(c+\phi_1 y_{t-1} + \ldots + \phi_p y_{t-p}, \sigma^2), \end{align}\]

which means

\[\begin{split}\begin{align} f_{Y_t|Y_{t-1},\ldots,Y_{t-p}} & (y_t|y_{t-1},\ldots,y_{t-p}, \boldsymbol{\theta}) \\ & = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{-\frac{1}{2\sigma^2}(y_t - c - \phi_1 y_{t-1} - \ldots - \phi_p y_{t-p})^2\right\}. \end{align}\end{split}\]

\(\smash{AR(p)}\) Likelihood

The likelihood of \(\smash{\boldsymbol{Y}_T = \{Y_t\}}\) is

\[\begin{split}\begin{align} \mathcal{L}(\boldsymbol{\theta}|\boldsymbol{y}_T) & = f_{\boldsymbol{Y}_T}(\boldsymbol{y}_T|\boldsymbol{\theta}) \\ & = f_{\boldsymbol{Y}_p}(\boldsymbol{y}_p|\boldsymbol{\theta}) \prod_{t=p+1}^T f_{Y_t|Y_{t-1},\ldots,Y_{t-p}}(y_t|y_{t-1},\ldots,y_{t-p},\boldsymbol{\theta}) \end{align}\end{split}\]

where \(\smash{f_{\boldsymbol{Y}_p}(\boldsymbol{y}_p|\boldsymbol{\theta})}\) is the joint density of \(\smash{\boldsymbol{Y}_T = \{Y_t\}_{t=1}^p}\).

  • Maximizing this likelihood results in a set of nonlinear equations in \(\smash{\boldsymbol{\theta}}\) and \(\smash{\boldsymbol{y}_T}\), and must be solved numerically.

\(\smash{AR(p)}\) Conditional Likelihood

We can approximate the \(\smash{AR(p)}\) likelihood with only the product of conditional densities:

\[\begin{split}\begin{align} \mathcal{L}(\boldsymbol{\theta}|\boldsymbol{y}_T) & \approx \prod_{t=p+1}^T f_{Y_t|Y_{t-1},\ldots,Y_{t-p}}(y_t|y_{t-1},\ldots,y_{t-p},\boldsymbol{\theta}) \\ & = \prod_{t=p+1}^T \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{-\frac{1}{2\sigma^2}(y_t - c - \phi_1 y_{t-1} - \ldots - \phi_p y_{t-p})^2\right\} \\ & = \left(2 \pi \sigma^2\right)^{-\frac{T-p}{2}} \exp \left\{-\frac{1}{2\sigma^2} \sum_{t=p+1}^T (y_t - c - \phi_1 y_{t-1} - \ldots - \phi_p y_{t-p})^2\right\}. \end{align}\end{split}\]

\(\smash{AR(p)}\) Conditional Log Likelihood

The conditional log likelihood of the \(\smash{AR(p)}\) is

\[\begin{align} \ell(\boldsymbol{\theta}|\boldsymbol{y}_T) & = -\frac{T-p}{2} \log(2\pi) -\frac{T-p}{2} \log(\sigma^2) -\frac{1}{2\sigma^2} \sum_{t=p+1}^T (y_t - c - \phi_1 y_{t-1} - \ldots - \phi_p y_{t-p})^2. \end{align}\]
  • Maximizing the conditional log likelihood with respect to \(\smash{c,\phi_1,\ldots,\phi_p}\), conditional on \(\smash{\sigma^2}\), is the same as minimizing
\[\smash{\sum_{t=p+1}^T (y_t - c - \phi_1 y_{t-1} - \ldots - \phi_p y_{t-p})^2.}\]
  • Hence, the MLEs are the same as the least squares estimates.

\(\smash{AR(p)}\) Conditional MLEs

Since the MLEs and LS estimates are the same, we can solve for the MLEs by simply running a regression

\[\smash{\boldsymbol{y} = X \boldsymbol{\beta} + \boldsymbol{e},}\]

where

\[\begin{split}\boldsymbol{\beta} = \left[\begin{array}{c} c \\ \phi_{1} \\ \vdots \\ \phi_{p} \end{array} \right] \hspace{5pt} X = \left[\begin{array}{ccccc} 1 & y_{T-1} & y_{T-2} & \ldots & y_{T-p} \\ \vdots &\vdots &\vdots &\vdots &\vdots \\ 1 & y_{p} & y_{p-1} & \ldots & y_{1} \end{array}\right] \hspace{5pt} \boldsymbol{y} =\left[\begin{array}{c} y_{T} \\ \vdots \\ y_{p+1} \\ \end{array} \right] \hspace{5pt} \boldsymbol{e} = \left[\begin{array}{c} e_{T} \\ \vdots \\ e_{p+1} \\ \end{array}\right].\end{split}\]

\(\smash{AR(p)}\) Conditional MLEs

Differentiating the log likelihood with respect to \(\smash{\sigma^{2}}\),

\[\begin{split}\begin{align} \frac{\partial l}{\partial \sigma^{2}}\Big|_{\hat{\sigma}^{2}} & = - \frac{T-p}{2\hat{\sigma}^{2}} + \frac{1}{2\hat{\sigma}^{4}} \sum_{t=p+1}^{T}(y_{t}-c-\phi y_{t-1}-\ldots-\phi_{p}y_{t-p})^{2} = 0 \\ \implies \hat{\sigma}^{2} & = \frac{1}{T-p} \sum_{t=p+1}^{T}(y_{t}-c-\phi y_{t-1}-\ldots-\phi_{p}y_{t-p})^{2} \\ & \approx \frac{1}{T-p} \sum_{t=p+1}^{T}(y_{t}-\hat{c}-\hat{\phi} y_{t-1}-\ldots-\hat{\phi_p} y_{t-p})^{2}. \end{align}\end{split}\]
  • This is the usual regression estimator of \(\smash{\sigma^2}\).

\(\smash{AR(p)}\) Conditional MLEs

  • Assuming Gaussianity doesn’t impact the consistency of our estimates.
  • If \(\smash{\boldsymbol{\varepsilon}}\) is not Gaussian, then \(\smash{\hat{\boldsymbol{\beta}}}\) is the Quasi Maximum Likelihood Estimate because the model is misspecified.

\(\smash{MA(q)}\) Conditional Likelihood

Recall a Gaussian \(\smash{MA(q)}\) process:

\[Y_t = \mu + \varepsilon_{t} + \psi_1 \varepsilon_{t-1} + \psi_2 \varepsilon_{t-2} + \ldots + \psi_q \varepsilon_{t-q}, \hspace{5pt} \varepsilon_{t} \overset{i.i.d.}{\sim} \mathcal{N}(0,\sigma^{2})\]
  • Now, \(\smash{\boldsymbol{\theta} = (\mu,\psi_1,\ldots,\psi_q,\sigma^{2})^{'}}\).

\(\smash{MA(q)}\) Conditional Likelihood

If \(\smash{\varepsilon_{t-q}, \ldots, \varepsilon_{t-1}}\) are known with certainty then

\[\begin{split}\begin{align} Y_t & \sim \mathcal{N}(\mu + \psi_1 \varepsilon_{t-1} + \ldots + \psi_q \varepsilon_{t-q}, \sigma^2) \\ \implies f_{Y_t|\varepsilon_{t-q}, \ldots, \varepsilon_{t-1}} & (y_t|\varepsilon_{t-q}, \ldots, \varepsilon_{t-1}) \\ & = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left\{-\frac{1}{2\sigma^2} (y_t - \mu - \psi_1 \varepsilon_{t-1} - \ldots - \psi_q \varepsilon_{t-q})^2 \right\} \\ & = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left\{-\frac{1}{2\sigma^2} \varepsilon_t^2 \right\}. \end{align}\end{split}\]

\(\smash{MA(q)}\) Conditional Likelihood

Assume \(\smash{\varepsilon_{0} = \varepsilon_{-1} = \varepsilon_{-2} = \ldots = \varepsilon_{-q+1} = 0}\) and iteratively compute

\[\smash{\varepsilon_t = y_t - \mu - \psi_1 \varepsilon_{t-1} - \ldots - \psi_q \varepsilon_{t-q}, \,\,\,\, \text{for} \,\,\,\, t = 1,\ldots,T}.\]

\(\smash{MA(q)}\) Conditional Likelihood

Then the likelihood is

\[\begin{split}\begin{align} \mathcal{L}(\boldsymbol{\theta}|\boldsymbol{y}_T, \boldsymbol{\varepsilon}_0 = \boldsymbol{0}) & = f_{Y_1,\ldots,Y_T|\boldsymbol{\varepsilon}_0}(y_1,\ldots,y_T| \boldsymbol{\varepsilon}_0, \boldsymbol{\theta}) \\ & = f_{Y_1|\boldsymbol{\varepsilon}_0}(y_1|\boldsymbol{\varepsilon}_0, \boldsymbol{\theta}) \prod_{t=2}^T f_{Y_t|Y_1,\ldots,Y_t, \boldsymbol{\varepsilon}_0}(y_t|y_1,\ldots,y_t,\boldsymbol{\varepsilon}_0, \boldsymbol{\theta}) \\ & = \prod_{t=1}^T \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left\{-\frac{1}{2\sigma^2} \varepsilon_t^2 \right\} \\ & = \frac{1}{(2\pi \sigma^2)^{\frac{T}{2}}} \exp\left\{-\frac{1}{2\sigma^2} \sum_{t=1}^T \varepsilon_t^2 \right\} \end{align}\end{split}\]

where \(\smash{\boldsymbol{\varepsilon}_0 = \{\varepsilon_t\}_{t=-q+1}^{0}}\).

\(\smash{MA(q)}\) Conditional Log Likelihood

The log likelihood is

\[\smash{\ell(\boldsymbol{\theta}|\boldsymbol{y}_T, \boldsymbol{\varepsilon}_0 = \boldsymbol{0}) = -\frac{T}{2} \log(2\pi) - \frac{T}{2} \log(\sigma^2) - \frac{1}{2\sigma^2} \sum_{t=1}^T \varepsilon_t^2.}\]
  • The MLEs cannot be found analytically.

The rough numerical algorithm is

  1. Guess values for \(\smash{\boldsymbol{\theta} = (\mu, \psi_1,\ldots,\psi_q,\sigma^2)^{'}}\).
  1. Assume \(\smash{\varepsilon_{0} = \varepsilon_{-1} = \varepsilon_{-2} = \ldots = \varepsilon_{-q+1} = 0}\).
  1. Iteratively compute \(\smash{\{\varepsilon\}_{t=1}^T}\).
  1. Evaluate the log likelihood for \(\smash{\{\varepsilon\}_{t=1}^T}\).
  1. Update \(\smash{\boldsymbol{\theta}}\) and return to step 2 until convergence.

\(\smash{MA(q)}\) Conditional Log Likelihood

The conditional log likelihood function can only be used with the invertible representation of the \(\smash{MA(q)}\).

  • If a non-invertible representation is used, it can be shown (via backward recursion on \(\smash{\varepsilon_t}\)) that the assumption of \(\smash{\boldsymbol{\varepsilon}_0 = \boldsymbol{0}}\) is explosive.

\(\smash{ARMA(p,q)}\) Cond. Likelihood

Recall a Gaussian \(\smash{ARMA(p,q)}\) process:

\[\begin{split}\begin{align} Y_t & = c + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} \\ & \hspace{0.7in} + \varepsilon_{t} + \psi_1 \varepsilon_{t-1} + \psi_2 \varepsilon_{t-2} + \ldots + \psi_q \varepsilon_{t-q}, \hspace{5pt} \varepsilon_{t} \overset{i.i.d.}{\sim} \mathcal{N}(0,\sigma^{2}). \end{align}\end{split}\]

\(\smash{ARMA(p,q)}\) Cond. Likelihood

To form the conditional likelihood, we combine the methods of the \(\smash{AR(p)}\) and \(\smash{MA(q)}\):

  1. Condition on \(\smash{y_0 = y_{-1} = \ldots = y_{-p+1} = \mu = \frac{c}{1-\phi_1 - \ldots - \phi_p}}\).
  1. Condition on \(\smash{\varepsilon_0 = \varepsilon_{-1} = \ldots = \varepsilon_{-q+1} = 0}\).
  1. Recursively compute \(\smash{\{\varepsilon_t\}_{t=1}^T}\) using \(\smash{\{y_t\}_{t=1}^T}\), \(\smash{\{\varepsilon_t\}_{t=-q+1}^0}\) and \(\smash{\{y_t\}_{t=-p+1}^0}\).
  1. Compute the log likelihood as
\[\smash{\ell(\boldsymbol{\theta}|\boldsymbol{y}_T, \boldsymbol{\varepsilon}_0 = \boldsymbol{0}) = -\frac{T}{2} \log(2\pi) - \frac{T}{2} \log(\sigma^2) - \frac{1}{2\sigma^2} \sum_{t=1}^T \varepsilon_t^2.}\]

The \(\smash{MA}\) polynomial must be invertible in order to use the conditional log likelihood for estimation.

\(\smash{ARMA(p,q)}\) Cond. Likelihood

Alternatively, we could start the recursions at \(\smash{t=p+1}\) without an initial condition on \(\smash{\{y_t\}_{t=-p+1}^0}\).

  1. Condition on \(\smash{\varepsilon_p = \varepsilon_{p-1} = \ldots = \varepsilon_{p-q+1} = 0}\).
  1. Recursively compute \(\smash{\{\varepsilon_t\}_{t=p+1}^T}\) using \(\smash{\{y_t\}_{t=1}^T}\) and \(\smash{\{\varepsilon_t\}_{t=p-q+1}^0}\).
  1. Compute the log likelihood as
\[\smash{\ell(\boldsymbol{\theta}|\boldsymbol{y}_T, \boldsymbol{\varepsilon}_0 = \boldsymbol{0}) = -\frac{T-p}{2} \log(2\pi) - \frac{T-p}{2} \log(\sigma^2) - \frac{1}{2\sigma^2} \sum_{t=p+1}^T \varepsilon_t^2.}\]