Moving Average Processes¶

\(MA(1)\)¶

Given white noise \(\{\varepsilon_t\}\), consider the process

\[Y_t = \mu + \varepsilon_t + \theta \varepsilon_{t-1},\]

where \(\mu\) and \(\theta\) are constants.

This is a first-order moving average or \(MA(1)\) process.

\(MA(1)\) Mean and Variance¶

The mean of the first-order moving average process is

\[E[Y_t] = E[\mu + \varepsilon_t + \theta \varepsilon_{t-1}] \hspace{0.65in}\]

\[= \mu + E[\varepsilon_t] + \theta E[\varepsilon_{t-1}]\]

\[= \mu. \hspace{1.27in}\]

\(MA(1)\) Autocovariances¶

\[\begin{split}\gamma_j & = E\left[(Y_t - \mu)(Y_{t-j} - \mu)\right] \hspace{2in}\end{split}\]

\[= E\left[(\varepsilon_t + \theta \varepsilon_{t-1})(\varepsilon_{t-j} + \theta \varepsilon_{t-j-1})\right] \hspace{1.18in}\]

\[= E[\varepsilon_t \varepsilon_{t-j} + \theta \varepsilon_t \varepsilon_{t-j-1} + \theta \varepsilon_{t-1} \varepsilon_{t-j} + \theta^2 \varepsilon_{t-1}\varepsilon_{t-j-1}]\]

\[\hspace{0.49in} = E[\varepsilon_t \varepsilon_{t-j}] + \theta E[\varepsilon_t \varepsilon_{t-j-1}] + \theta E[\varepsilon_{t-1} \varepsilon_{t-j}] + \theta^2 E[\varepsilon_{t-1}\varepsilon_{t-j-1}].\]

\(MA(1)\) Autocovariances¶

If \(j = 0\)

\[\hspace{0.2in} \gamma_0 = E[\varepsilon^2_t] + \theta E[\varepsilon_t \varepsilon_{t-1}] + \theta E[\varepsilon_{t-1} \varepsilon_t] + \theta^2 E[\varepsilon^2_{t-1}] = (1+\theta^2)\sigma^2.\]

If \(j = 1\)

\[\hspace{0.2in} \gamma_1 = E[\varepsilon_t \varepsilon_{t-1}] + \theta E[\varepsilon_t \varepsilon_{t-2}] + \theta E[\varepsilon^2_{t-1}] + \theta^2 E[\varepsilon_{t-1} \varepsilon_{t-2}] = \theta \sigma^2.\]

If \(j > 1\), all of the expectations are zero:

\[\gamma_j = 0.\]

\(MA(1)\) Stationarity¶

Since the mean and autocovariances are independent of time, an \(MA(1)\) is weakly stationary.

This is true for all values of \(\theta\).

\(MA(1)\) Autocorrelations¶

The autocorrelations of an \(MA(1)\) are

\(j = 0\): \(\hspace{0.7in} \rho_0 = 1\) (always).

\(j = 1\):

\[\rho_1 = \frac{\theta \sigma^2}{(1+\theta^2) \sigma^2} = \frac{\theta}{1+\theta^2}\]

\(j > 1\): \(\hspace{0.72in} \rho_j = 0\).

If \(\theta > 0\), first-order lags of \(Y_t\) are positively autocorrelated.

If \(\theta < 0\), first-order lags of \(Y_t\) are negatively autocorrelated.

\(MA(1)\) Autocorrelations¶

\(MA(q)\)¶

A \(q\) th-order moving average or \(MA(q)\) process is

\[Y_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q},\]

where \(\mu, \theta_1, \ldots, \theta_q\) are any real numbers.

\(MA(q)\) Mean¶

As with the \(MA(1)\):

\[E[Y_t] = E[\mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q}] \hspace{0.85in}\]

\[= \mu + E[\varepsilon_t] + \theta_1 E[\varepsilon_{t-1}] + \ldots + \theta_q E[\varepsilon_{t-q}]\]

\[= \mu. \hspace{2.55in}\]

\(MA(q)\) Autocovariances¶

\[\begin{split}\gamma_j & = E\left[(Y_t-\mu)(Y_{t-j}-\mu)\right] \hspace{0.8in}\end{split}\]

\[= E\big[(\varepsilon_t + \theta_1\varepsilon_{t-1} + \ldots + \theta_q \varepsilon_{t-q})\]

\[\hspace{1in} \times (\varepsilon_{t-j} + \theta_1\varepsilon_{t-j-1} + \ldots + \theta_q \varepsilon_{t-j-q})\big].\]

For \(j > q\), all of the products result in zero expectations: \(\gamma_j = 0\), for \(j > q\).

\(MA(q)\) Autocovariances¶

For \(j = 0\), the squared terms result in nonzero expectations, while the cross products lead to zero expectations:

\[\gamma_0 = E[\varepsilon^2_t ] + \theta^2_1 E[\varepsilon^2_{t-1}] + \ldots + \theta^2_q E[\varepsilon^2_{t-q}] = \left(1 + \sum_{j=1}^q \theta^2_j\right) \sigma^2.\]

\(MA(q)\) Autocovariances¶

For \(j = \{1,2,\ldots,q\}\), the nonzero expectation terms are

\[\gamma_j = \theta_j E[\varepsilon^2_{t-j}] + \theta_{j+1}\theta_1 E[\varepsilon^2_{t-j-1}] \hspace{1.08in}\]

\[\hspace{0.8in} + \theta_{j+2}\theta_2 E[\varepsilon^2_{t-j-2}] + \ldots + \theta_{q}\theta_{q-j} E[\varepsilon^2_{t-q}]\]

\[= (\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + \ldots + \theta_q\theta_{q-j}) \sigma^2.\]

The autocovariances can be stated concisely as

\[\begin{split}\gamma_j = \begin{cases} (\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + \ldots + \theta_q\theta_{q-j}) \sigma^2 & \text{for } j = 0, 1, \ldots, q \\ 0 & \text{for } j > q. \end{cases} \hspace{0.2in}\end{split}\]

where \(\theta_0 = 1\).

\(MA(q)\) Autocorrelations¶

The autocorrelations can be stated concisely as

\[\begin{split}\rho_j = \begin{cases} \frac{\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + \ldots + \theta_q\theta_{q-j}}{\theta^2_0 + \theta^2_1 + \theta^2_2 + \ldots + \theta^2_q} & \text{ for } j = 0, 1, \ldots, q \\ 0 & \text{ for } j > q. \end{cases}\end{split}\]

where \(\theta_0 = 1\).

\(MA(2)\) Example¶

For an \(MA(2)\) process

\[\begin{split}\gamma_0 & = (1 + \theta^2_1 + \theta^2_2) \sigma^2\end{split}\]

\[\begin{split}\gamma_1 & = (\theta_1 + \theta_2 \theta_1) \sigma^2\end{split}\]

\[\begin{split}\gamma_2 & = \theta_2 \sigma^2\end{split}\]

\[\begin{split}\gamma_3 & = \gamma_4 = \ldots = 0.\end{split}\]

Estimating \(MA\) Models¶

Estimation of an \(MA\) model is done via maximum likelihood.

For an \(MA(q)\) model, one would first specify a joint likelihood for the parameters \(\{\theta_1, \ldots, \theta_q, \mu, \sigma^2\}\).

Taking derivatives of the log likelihood with respect to each of the parameters would result in a system of equations that could be solved for the MLEs: \(\{\hat{\theta}_1, \ldots, \hat{\theta}_q, \hat{\mu}, \hat{\sigma}^2\}\).

The exact likelihood is a bit cumbersome and maximization requires specialized numerical methods.

The MLEs can be obtained with the \(\mathtt{arima}\) function in \(\mathtt{R}\).

Which \(MA\)?¶

How do we know if an \(MA\) model is appropriate and which \(MA\) model to fit?

For an \(MA(q)\), we know that \(\gamma_j = 0\) for \(j > q\).

We should only fit an \(MA\) model if the autocorrelations drop to zero for all \(j > q\) for some value \(q\).

The \(\mathtt{acf}\) function in \(\mathtt{R}\) can be used to compute empirical autocorrelations of the data.

The appropriate \(q\) can then be obtained from the empirical ACF.

Which \(MA\)?¶

After fitting an \(MA\) model, we can examine the residuals.

The \(\mathtt{acf}\) function can be used to compute empirical autocorrelations of the residuals.

If the residuals are autocorrelated, the model is not a good fit. Consider changing the order of the \(MA\) or using another model.

Which \(MA\)?¶

The \(\mathtt{auto.arima}\) function in \(\mathtt{R}\) estimates a range of \(MA(q)\) models and selects the one with the best fit.

\(\mathtt{auto.arima}\) uses the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to select the model.

Minimizing AIC and BIC amounts to minimizing the sum of squared residuals, with a penalty term that is related to the number of model parameters.