Moving Average Processes

\(MA(1)\)

Given white noise \(\{\varepsilon_t\}\), consider the process

\[Y_t = \mu + \varepsilon_t + \theta \varepsilon_{t-1},\]

where \(\mu\) and \(\theta\) are constants.

  • This is a first-order moving average or \(MA(1)\) process.

\(MA(1)\) Mean and Variance

The mean of the first-order moving average process is

\[E[Y_t] = E[\mu + \varepsilon_t + \theta \varepsilon_{t-1}] \hspace{0.65in}\]
\[= \mu + E[\varepsilon_t] + \theta E[\varepsilon_{t-1}]\]
\[= \mu. \hspace{1.27in}\]

\(MA(1)\) Autocovariances

\[\begin{split}\gamma_j & = E\left[(Y_t - \mu)(Y_{t-j} - \mu)\right] \hspace{2in}\end{split}\]
\[= E\left[(\varepsilon_t + \theta \varepsilon_{t-1})(\varepsilon_{t-j} + \theta \varepsilon_{t-j-1})\right] \hspace{1.18in}\]
\[= E[\varepsilon_t \varepsilon_{t-j} + \theta \varepsilon_t \varepsilon_{t-j-1} + \theta \varepsilon_{t-1} \varepsilon_{t-j} + \theta^2 \varepsilon_{t-1}\varepsilon_{t-j-1}]\]
\[\hspace{0.49in} = E[\varepsilon_t \varepsilon_{t-j}] + \theta E[\varepsilon_t \varepsilon_{t-j-1}] + \theta E[\varepsilon_{t-1} \varepsilon_{t-j}] + \theta^2 E[\varepsilon_{t-1}\varepsilon_{t-j-1}].\]

\(MA(1)\) Autocovariances

  • If \(j = 0\)
\[\hspace{0.2in} \gamma_0 = E[\varepsilon^2_t] + \theta E[\varepsilon_t \varepsilon_{t-1}] + \theta E[\varepsilon_{t-1} \varepsilon_t] + \theta^2 E[\varepsilon^2_{t-1}] = (1+\theta^2)\sigma^2.\]
  • If \(j = 1\)
\[\hspace{0.2in} \gamma_1 = E[\varepsilon_t \varepsilon_{t-1}] + \theta E[\varepsilon_t \varepsilon_{t-2}] + \theta E[\varepsilon^2_{t-1}] + \theta^2 E[\varepsilon_{t-1} \varepsilon_{t-2}] = \theta \sigma^2.\]
  • If \(j > 1\), all of the expectations are zero:
\[\gamma_j = 0.\]

\(MA(1)\) Stationarity

Since the mean and autocovariances are independent of time, an \(MA(1)\) is weakly stationary.

  • This is true for all values of \(\theta\).

\(MA(1)\) Autocorrelations

The autocorrelations of an \(MA(1)\) are

  • \(j = 0\): \(\hspace{0.7in} \rho_0 = 1\) (always).
  • \(j = 1\):
\[\rho_1 = \frac{\theta \sigma^2}{(1+\theta^2) \sigma^2} = \frac{\theta}{1+\theta^2}\]
  • \(j > 1\): \(\hspace{0.72in} \rho_j = 0\).
  • If \(\theta > 0\), first-order lags of \(Y_t\) are positively autocorrelated.
  • If \(\theta < 0\), first-order lags of \(Y_t\) are negatively autocorrelated.

\(MA(1)\) Autocorrelations

_images/ma1-acf.png

\(MA(q)\)

A \(q\) th-order moving average or \(MA(q)\) process is

\[Y_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q},\]

where \(\mu, \theta_1, \ldots, \theta_q\) are any real numbers.

\(MA(q)\) Mean

As with the \(MA(1)\):

\[E[Y_t] = E[\mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q}] \hspace{0.85in}\]
\[= \mu + E[\varepsilon_t] + \theta_1 E[\varepsilon_{t-1}] + \ldots + \theta_q E[\varepsilon_{t-q}]\]
\[= \mu. \hspace{2.55in}\]

\(MA(q)\) Autocovariances

\[\begin{split}\gamma_j & = E\left[(Y_t-\mu)(Y_{t-j}-\mu)\right] \hspace{0.8in}\end{split}\]
\[= E\big[(\varepsilon_t + \theta_1\varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q})\]
\[\hspace{1in} \times (\varepsilon_{t-j} + \theta_1\varepsilon_{t-j-1} + \ldots + \theta_q \varepsilon_{t-j-q})\big].\]
  • For \(j > q\), all of the products result in zero expectations: \(\gamma_j = 0\), for \(j > q\).

\(MA(q)\) Autocovariances

  • For \(j = 0\), the squared terms result in nonzero expectations, while the cross products lead to zero expectations:
\[\gamma_0 = E[\varepsilon^2_t ] + \theta^2_1 E[\varepsilon^2_{t-1}] + \ldots + \theta^2_q E[\varepsilon^2_{t-q}] = \left(1 + \sum_{j=1}^q \theta^2_j\right) \sigma^2.\]

\(MA(q)\) Autocovariances

  • For \(j = \{1,2,\ldots,q\}\), the nonzero expectation terms are
\[\gamma_j = \theta_j E[\varepsilon^2_{t-j}] + \theta_{j+1}\theta_1 E[\varepsilon^2_{t-j-1}] \hspace{1.08in}\]
\[\hspace{0.8in} + \theta_{j+2}\theta_2 E[\varepsilon^2_{t-j-2}] + \ldots + \theta_{q}\theta_{q-j} E[\varepsilon^2_{t-q}]\]
\[= (\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + \ldots + \theta_q\theta_{q-j}) \sigma^2.\]

The autocovariances can be stated concisely as

\[\begin{split}\gamma_j = \begin{cases} (\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + \ldots + \theta_q\theta_{q-j}) \sigma^2 & \text{for } j = 0, 1, \ldots, q \\ 0 & \text{for } j > q. \end{cases} \hspace{0.2in}\end{split}\]

where \(\theta_0 = 1\).

\(MA(q)\) Autocorrelations

The autocorrelations can be stated concisely as

\[\begin{split}\rho_j = \begin{cases} \frac{\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + \ldots + \theta_q\theta_{q-j}}{\theta^2_0 + \theta^2_1 + \theta^2_2 + \ldots + \theta^2_q} & \text{ for } j = 0, 1, \ldots, q \\ 0 & \text{ for } j > q. \end{cases}\end{split}\]

where \(\theta_0 = 1\).

\(MA(2)\) Example

For an \(MA(2)\) process

\[\begin{split}\gamma_0 & = (1 + \theta^2_1 + \theta^2_2) \sigma^2\end{split}\]
\[\begin{split}\gamma_1 & = (\theta_1 + \theta_2 \theta_1) \sigma^2\end{split}\]
\[\begin{split}\gamma_2 & = \theta_2 \sigma^2\end{split}\]
\[\begin{split}\gamma_3 & = \gamma_4 = \ldots = 0.\end{split}\]

Estimating \(MA\) Models

Estimation of an \(MA\) model is done via maximum likelihood.

  • For an \(MA(q)\) model, one would first specify a joint likelihood for the parameters \(\{\theta_1, \ldots, \theta_q, \mu, \sigma^2\}\).
  • Taking derivatives of the log likelihood with respect to each of the parameters would result in a system of equations that could be solved for the MLEs: \(\{\hat{\theta}_1, \ldots, \hat{\theta}_q, \hat{\mu}, \hat{\sigma}^2\}\).
  • The exact likelihood is a bit cumbersome and maximization requires specialized numerical methods.
  • The MLEs can be obtained with the \(\mathtt{arima}\) function in \(\mathtt{R}\).

Which \(MA\)?

How do we know if an \(MA\) model is appropriate and which \(MA\) model to fit?

  • For an \(MA(q)\), we know that \(\gamma_j = 0\) for \(j > q\).
  • We should only fit an \(MA\) model if the autocorrelations drop to zero for all \(j > q\) for some value \(q\).
  • The \(\mathtt{acf}\) function in \(\mathtt{R}\) can be used to compute empirical autocorrelations of the data.
  • The appropriate \(q\) can then be obtained from the empirical ACF.

Which \(MA\)?

  • After fitting an \(MA\) model, we can examine the residuals.
  • The \(\mathtt{acf}\) function can be used to compute empirical autocorrelations of the residuals.
  • If the residuals are autocorrelated, the model is not a good fit. Consider changing the order of the \(MA\) or using another model.

Which \(MA\)?

The \(\mathtt{auto.arima}\) function in \(\mathtt{R}\) estimates a range of \(MA(q)\) models and selects the one with the best fit.

  • \(\mathtt{auto.arima}\) uses the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to select the model.
  • Minimizing AIC and BIC amounts to minimizing the sum of squared residuals, with a penalty term that is related to the number of model parameters.