ARMA Maximum Likelihood Estimation
\(\smash{AR(p)}\) Likelihood
Recall a Gaussian \(AR(p)\) process:
\[\begin{align}
Y_t & = c + \phi_1 Y_{t-1} + \ldots + \phi_p Y_{t-p} +
\varepsilon_t, \,\,\,\, \varepsilon_t \stackrel{i.i.d.}{\sim}
\mathcal{N}(0,\sigma^2).
\end{align}\]
- In this case \(\smash{\boldsymbol{\theta} =
(c,\phi_1,\ldots,\phi_p,\sigma^2)}\).
- We will suppose that \(\smash{\{Y_t\}}\) is stationary and
causal.
\(\smash{AR(p)}\) Likelihood
Suppose we know that \(\smash{Y_{t-1} = y_{t-1}, Y_{t-2} =
y_{t-2}, \ldots, Y_{t-p} = y_{t-p}}\) for \(\smash{t > p}\). Then
\[\begin{split}\begin{gather}
Y_t = c + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} +
\varepsilon_t \\
\text{E}[Y_t|Y_{t-1},\ldots,Y_{t-p},\boldsymbol{\theta}] = c +
\phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} \\
\text{Var}(Y_t|Y_{t-1},\ldots,Y_{t-p},\boldsymbol{\theta}) =
\sigma^2.
\end{gather}\end{split}\]
\(\smash{AR(p)}\) Likelihood
Thus,
\[\begin{align}
Y_t|Y_{t-1},\ldots,Y_{t-p} \sim \mathcal{N}(c+\phi_1 y_{t-1} +
\ldots + \phi_p y_{t-p}, \sigma^2),
\end{align}\]
which means
\[\begin{split}\begin{align}
f_{Y_t|Y_{t-1},\ldots,Y_{t-p}} & (y_t|y_{t-1},\ldots,y_{t-p},
\boldsymbol{\theta}) \\
& = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp
\left\{-\frac{1}{2\sigma^2}(y_t - c - \phi_1 y_{t-1} - \ldots -
\phi_p y_{t-p})^2\right\}.
\end{align}\end{split}\]
\(\smash{AR(p)}\) Likelihood
The likelihood of \(\smash{\boldsymbol{Y}_T = \{Y_t\}}\) is
\[\begin{split}\begin{align}
\mathcal{L}(\boldsymbol{\theta}|\boldsymbol{y}_T) & =
f_{\boldsymbol{Y}_T}(\boldsymbol{y}_T|\boldsymbol{\theta}) \\
& = f_{\boldsymbol{Y}_p}(\boldsymbol{y}_p|\boldsymbol{\theta})
\prod_{t=p+1}^T
f_{Y_t|Y_{t-1},\ldots,Y_{t-p}}(y_t|y_{t-1},\ldots,y_{t-p},\boldsymbol{\theta})
\end{align}\end{split}\]
where
\(\smash{f_{\boldsymbol{Y}_p}(\boldsymbol{y}_p|\boldsymbol{\theta})}\)
is the joint density of \(\smash{\boldsymbol{Y}_T =
\{Y_t\}_{t=1}^p}\).
- Maximizing this likelihood results in a set of nonlinear equations
in \(\smash{\boldsymbol{\theta}}\) and
\(\smash{\boldsymbol{y}_T}\), and must be solved numerically.
\(\smash{AR(p)}\) Conditional Likelihood
We can approximate the \(\smash{AR(p)}\) likelihood with only the
product of conditional densities:
\[\begin{split}\begin{align}
\mathcal{L}(\boldsymbol{\theta}|\boldsymbol{y}_T) & \approx
\prod_{t=p+1}^T
f_{Y_t|Y_{t-1},\ldots,Y_{t-p}}(y_t|y_{t-1},\ldots,y_{t-p},\boldsymbol{\theta})
\\
& = \prod_{t=p+1}^T \frac{1}{\sqrt{2 \pi \sigma^2}} \exp
\left\{-\frac{1}{2\sigma^2}(y_t - c - \phi_1 y_{t-1} - \ldots -
\phi_p y_{t-p})^2\right\} \\
& = \left(2 \pi \sigma^2\right)^{-\frac{T-p}{2}} \exp
\left\{-\frac{1}{2\sigma^2} \sum_{t=p+1}^T (y_t - c - \phi_1
y_{t-1} - \ldots - \phi_p y_{t-p})^2\right\}.
\end{align}\end{split}\]
\(\smash{AR(p)}\) Conditional Log Likelihood
The conditional log likelihood of the \(\smash{AR(p)}\) is
\[\begin{align}
\ell(\boldsymbol{\theta}|\boldsymbol{y}_T) & = -\frac{T-p}{2}
\log(2\pi) -\frac{T-p}{2} \log(\sigma^2) -\frac{1}{2\sigma^2}
\sum_{t=p+1}^T (y_t - c - \phi_1 y_{t-1} - \ldots - \phi_p
y_{t-p})^2.
\end{align}\]
- Maximizing the conditional log likelihood with respect to
\(\smash{c,\phi_1,\ldots,\phi_p}\), conditional on
\(\smash{\sigma^2}\), is the same as minimizing
\[\smash{\sum_{t=p+1}^T (y_t - c - \phi_1 y_{t-1} - \ldots - \phi_p
y_{t-p})^2.}\]
- Hence, the MLEs are the same as the least squares estimates.
\(\smash{AR(p)}\) Conditional MLEs
Since the MLEs and LS estimates are the same, we can solve for the
MLEs by simply running a regression
\[\smash{\boldsymbol{y} = X \boldsymbol{\beta} +
\boldsymbol{e},}\]
where
\[\begin{split}\boldsymbol{\beta} = \left[\begin{array}{c} c \\ \phi_{1} \\ \vdots
\\ \phi_{p} \end{array} \right] \hspace{5pt} X =
\left[\begin{array}{ccccc} 1 & y_{T-1} & y_{T-2} & \ldots & y_{T-p}
\\ \vdots &\vdots &\vdots &\vdots &\vdots \\ 1 & y_{p} & y_{p-1} &
\ldots & y_{1} \end{array}\right] \hspace{5pt}
\boldsymbol{y} =\left[\begin{array}{c} y_{T} \\ \vdots \\ y_{p+1}
\\ \end{array} \right] \hspace{5pt} \boldsymbol{e} =
\left[\begin{array}{c} e_{T} \\ \vdots \\ e_{p+1} \\
\end{array}\right].\end{split}\]
\(\smash{AR(p)}\) Conditional MLEs
Differentiating the log likelihood with respect to \(\smash{\sigma^{2}}\),
\[\begin{split}\begin{align}
\frac{\partial l}{\partial \sigma^{2}}\Big|_{\hat{\sigma}^{2}} &
= - \frac{T-p}{2\hat{\sigma}^{2}} +
\frac{1}{2\hat{\sigma}^{4}} \sum_{t=p+1}^{T}(y_{t}-c-\phi
y_{t-1}-\ldots-\phi_{p}y_{t-p})^{2} = 0 \\
\implies \hat{\sigma}^{2} & = \frac{1}{T-p}
\sum_{t=p+1}^{T}(y_{t}-c-\phi y_{t-1}-\ldots-\phi_{p}y_{t-p})^{2}
\\
& \approx \frac{1}{T-p} \sum_{t=p+1}^{T}(y_{t}-\hat{c}-\hat{\phi}
y_{t-1}-\ldots-\hat{\phi_p} y_{t-p})^{2}.
\end{align}\end{split}\]
- This is the usual regression estimator of \(\smash{\sigma^2}\).
\(\smash{AR(p)}\) Conditional MLEs
- Assuming Gaussianity doesn’t impact the consistency of our
estimates.
- If \(\smash{\boldsymbol{\varepsilon}}\) is not Gaussian, then
\(\smash{\hat{\boldsymbol{\beta}}}\) is the Quasi Maximum
Likelihood Estimate because the model is misspecified.
\(\smash{MA(q)}\) Conditional Likelihood
Recall a Gaussian \(\smash{MA(q)}\) process:
\[Y_t = \mu + \varepsilon_{t} + \psi_1 \varepsilon_{t-1} + \psi_2
\varepsilon_{t-2} + \ldots + \psi_q \varepsilon_{t-q},
\hspace{5pt} \varepsilon_{t} \overset{i.i.d.}{\sim} \mathcal{N}(0,\sigma^{2})\]
- Now, \(\smash{\boldsymbol{\theta} = (\mu,\psi_1,\ldots,\psi_q,\sigma^{2})^{'}}\).
\(\smash{MA(q)}\) Conditional Likelihood
If \(\smash{\varepsilon_{t-q}, \ldots, \varepsilon_{t-1}}\)
are known with certainty then
\[\begin{split}\begin{align}
Y_t & \sim \mathcal{N}(\mu + \psi_1 \varepsilon_{t-1} + \ldots +
\psi_q \varepsilon_{t-q}, \sigma^2) \\
\implies f_{Y_t|\varepsilon_{t-q}, \ldots,
\varepsilon_{t-1}} & (y_t|\varepsilon_{t-q}, \ldots,
\varepsilon_{t-1}) \\
& = \frac{1}{\sqrt{2\pi \sigma^2}}
\exp\left\{-\frac{1}{2\sigma^2} (y_t - \mu - \psi_1
\varepsilon_{t-1} - \ldots - \psi_q \varepsilon_{t-q})^2 \right\}
\\
& = \frac{1}{\sqrt{2\pi \sigma^2}}
\exp\left\{-\frac{1}{2\sigma^2} \varepsilon_t^2 \right\}.
\end{align}\end{split}\]
\(\smash{MA(q)}\) Conditional Likelihood
Assume \(\smash{\varepsilon_{0} = \varepsilon_{-1} =
\varepsilon_{-2} = \ldots = \varepsilon_{-q+1} = 0}\) and iteratively
compute
\[\smash{\varepsilon_t = y_t - \mu - \psi_1 \varepsilon_{t-1} -
\ldots - \psi_q \varepsilon_{t-q}, \,\,\,\, \text{for} \,\,\,\, t =
1,\ldots,T}.\]
\(\smash{MA(q)}\) Conditional Likelihood
Then the likelihood is
\[\begin{split}\begin{align}
\mathcal{L}(\boldsymbol{\theta}|\boldsymbol{y}_T,
\boldsymbol{\varepsilon}_0 = \boldsymbol{0}) & =
f_{Y_1,\ldots,Y_T|\boldsymbol{\varepsilon}_0}(y_1,\ldots,y_T|
\boldsymbol{\varepsilon}_0, \boldsymbol{\theta}) \\
& = f_{Y_1|\boldsymbol{\varepsilon}_0}(y_1|\boldsymbol{\varepsilon}_0,
\boldsymbol{\theta}) \prod_{t=2}^T f_{Y_t|Y_1,\ldots,Y_t,
\boldsymbol{\varepsilon}_0}(y_t|y_1,\ldots,y_t,\boldsymbol{\varepsilon}_0,
\boldsymbol{\theta}) \\
& = \prod_{t=1}^T \frac{1}{\sqrt{2\pi \sigma^2}}
\exp\left\{-\frac{1}{2\sigma^2} \varepsilon_t^2 \right\} \\
& = \frac{1}{(2\pi \sigma^2)^{\frac{T}{2}}}
\exp\left\{-\frac{1}{2\sigma^2} \sum_{t=1}^T \varepsilon_t^2
\right\}
\end{align}\end{split}\]
where \(\smash{\boldsymbol{\varepsilon}_0 =
\{\varepsilon_t\}_{t=-q+1}^{0}}\).
\(\smash{MA(q)}\) Conditional Log Likelihood
The log likelihood is
\[\smash{\ell(\boldsymbol{\theta}|\boldsymbol{y}_T,
\boldsymbol{\varepsilon}_0 = \boldsymbol{0}) = -\frac{T}{2}
\log(2\pi) - \frac{T}{2} \log(\sigma^2) - \frac{1}{2\sigma^2}
\sum_{t=1}^T \varepsilon_t^2.}\]
- The MLEs cannot be found analytically.
The rough numerical algorithm is
- Guess values for \(\smash{\boldsymbol{\theta} = (\mu,
\psi_1,\ldots,\psi_q,\sigma^2)^{'}}\).
- Assume \(\smash{\varepsilon_{0} = \varepsilon_{-1} =
\varepsilon_{-2} = \ldots = \varepsilon_{-q+1} = 0}\).
- Iteratively compute \(\smash{\{\varepsilon\}_{t=1}^T}\).
- Evaluate the log likelihood for
\(\smash{\{\varepsilon\}_{t=1}^T}\).
- Update \(\smash{\boldsymbol{\theta}}\) and return to step 2
until convergence.
\(\smash{MA(q)}\) Conditional Log Likelihood
The conditional log likelihood function can only be used with the
invertible representation of the \(\smash{MA(q)}\).
- If a non-invertible representation is used, it can be shown (via
backward recursion on \(\smash{\varepsilon_t}\)) that the
assumption of \(\smash{\boldsymbol{\varepsilon}_0 =
\boldsymbol{0}}\) is explosive.
\(\smash{ARMA(p,q)}\) Cond. Likelihood
Recall a Gaussian \(\smash{ARMA(p,q)}\) process:
\[\begin{split}\begin{align}
Y_t & = c + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} \\
& \hspace{0.7in} +
\varepsilon_{t} + \psi_1 \varepsilon_{t-1} + \psi_2
\varepsilon_{t-2} + \ldots + \psi_q \varepsilon_{t-q},
\hspace{5pt} \varepsilon_{t} \overset{i.i.d.}{\sim}
\mathcal{N}(0,\sigma^{2}).
\end{align}\end{split}\]
\(\smash{ARMA(p,q)}\) Cond. Likelihood
To form the conditional likelihood, we combine the methods of the
\(\smash{AR(p)}\) and \(\smash{MA(q)}\):
- Condition on \(\smash{y_0 = y_{-1} = \ldots = y_{-p+1} = \mu =
\frac{c}{1-\phi_1 - \ldots - \phi_p}}\).
- Condition on \(\smash{\varepsilon_0 = \varepsilon_{-1} =
\ldots = \varepsilon_{-q+1} = 0}\).
- Recursively compute \(\smash{\{\varepsilon_t\}_{t=1}^T}\) using
\(\smash{\{y_t\}_{t=1}^T}\),
\(\smash{\{\varepsilon_t\}_{t=-q+1}^0}\) and
\(\smash{\{y_t\}_{t=-p+1}^0}\).
- Compute the log likelihood as
\[\smash{\ell(\boldsymbol{\theta}|\boldsymbol{y}_T,
\boldsymbol{\varepsilon}_0 = \boldsymbol{0}) = -\frac{T}{2}
\log(2\pi) - \frac{T}{2} \log(\sigma^2) - \frac{1}{2\sigma^2}
\sum_{t=1}^T \varepsilon_t^2.}\]
The \(\smash{MA}\) polynomial must be invertible in order to use
the conditional log likelihood for estimation.
\(\smash{ARMA(p,q)}\) Cond. Likelihood
Alternatively, we could start the recursions at \(\smash{t=p+1}\)
without an initial condition on \(\smash{\{y_t\}_{t=-p+1}^0}\).
- Condition on \(\smash{\varepsilon_p = \varepsilon_{p-1} =
\ldots = \varepsilon_{p-q+1} = 0}\).
- Recursively compute \(\smash{\{\varepsilon_t\}_{t=p+1}^T}\)
using \(\smash{\{y_t\}_{t=1}^T}\) and
\(\smash{\{\varepsilon_t\}_{t=p-q+1}^0}\).
- Compute the log likelihood as
\[\smash{\ell(\boldsymbol{\theta}|\boldsymbol{y}_T,
\boldsymbol{\varepsilon}_0 = \boldsymbol{0}) = -\frac{T-p}{2}
\log(2\pi) - \frac{T-p}{2} \log(\sigma^2) - \frac{1}{2\sigma^2}
\sum_{t=p+1}^T \varepsilon_t^2.}\]