Autoregressive Processes¶

\(AR(1)\) Process¶

Given white noise \(\{\varepsilon_t\}\), consider the process

\[\begin{split}Y_t & = c + \phi Y_{t-1} + \varepsilon_t,\end{split}\]

where \(c\) and \(\phi\) are constants.

This is a first-order autoregressive or \(AR(1)\) process.

\(\phi\) can be thought of as a memory or feedback parameter and introduces serial correlation in \(Y_t\).

When \(\phi = 0\), \(Y_t\) is white noise with drift - it has no memory or serial correlation.

Recursive Substitution of \(AR(1)\)¶

Applying recursive substitution:

\[\begin{split}Y_t & = c + \phi Y_{t-1} + \varepsilon_t \qquad \qquad \qquad \qquad \qquad \quad \enspace \,\end{split}\]

\[\begin{split}& = c + \phi(c + \phi Y_{t-2} + \varepsilon_{t-1}) + \varepsilon_t \qquad \qquad \quad \; \;\end{split}\]

\[\begin{split}& = c + \phi c + \varepsilon_t + \phi \varepsilon_{t-1} + \phi^2 Y_{t-2} \qquad \qquad \quad \;\end{split}\]

\[\begin{split}& = c + \phi c + \varepsilon_t + \phi \varepsilon_{t-1} + \phi^2 (c + \phi Y_{t-3} + \varepsilon_{t-2})\end{split}\]

Recursive Substitution of \(AR(1)\)¶

\[\begin{split}& = c + \phi c + \phi^2 c + \varepsilon_t + \phi \varepsilon_{t-1} + \phi^2 \varepsilon_{t-2} + \phi^3 Y_{t-3}\end{split}\]

\[\begin{split}& \vdots \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad\end{split}\]

\[\begin{split}& \: = \sum_{i=0}^{\infty} \phi^i c + \sum_{i=0}^{\infty} \phi^i \varepsilon_{t-i}. \qquad \qquad \qquad \qquad \qquad\end{split}\]

\[\begin{split}& \, = \frac{c}{1-\phi} + \sum_{i=0}^{\infty} \phi^i \varepsilon_{t-i}. \qquad \qquad \qquad \qquad \qquad\end{split}\]

The infinite recursive substitution can only be performed if \(|\phi| < 1\).

Expectation of \(AR(1)\)¶

Assume \(Y_t\) is weakly stationary: \(|\phi| < 1\).

\[\begin{split}E[Y_t] & = c + \phi E[Y_{t-1}] + E[\varepsilon_t]\end{split}\]

\[\begin{split}& = c + \phi E[Y_t] \quad \; \;\end{split}\]

\[\begin{split}\Rightarrow E[Y_t] & = \frac{c}{1-\phi}.\end{split}\]

A Useful Property¶

If \(Y_t\) is weakly stationary,

\[\begin{split}Y_{t-j} - \mu & = \sum_{i=0}^{\infty} \phi^i \varepsilon_{t-j-i}.\end{split}\]

That is, for \(j \geq 1\), \(Y_{t-j}\) is a function of lagged values of \(\varepsilon_t\) and not \(\varepsilon_t\) itself.

As a result, for \(j \geq 1\)

\[\begin{split}E\left[(Y_{t-j}-\mu) \varepsilon_t\right] & = \sum_{i=0}^{\infty} \phi^i E[\varepsilon_t \varepsilon_{t-j-i}] = 0.\end{split}\]

Variance of \(AR(1)\)¶

Given that \(\mu = c/(1-\phi)\) for weakly stationary \(Y_t\):

\[Y_t = \mu(1-\phi) + \phi Y_{t-1} + \varepsilon_t\]

\[\Rightarrow (Y_t - \mu) = \phi(Y_{t-1} - \mu) + \varepsilon_t.\]

Squaring both sides and taking expectations:

\[\begin{split}E\left[(Y_t-\mu)^2\right] & = \phi^2 E\left[(Y_{t-1}-\mu)^2\right] + 2\phi E\left[(Y_{t-1}-\mu)\varepsilon_t\right] + E[\varepsilon_t^2]\end{split}\]

\[\begin{split}& = \phi^2 E\left[(Y_t-\mu)^2\right] + \sigma^2 \qquad \qquad\end{split}\]

\[\Rightarrow (1-\phi^2) \gamma_0 = \sigma^2\]

\[\Rightarrow \gamma_0 = \frac{\sigma^2}{1-\phi^2}\]

Autocovariances of \(AR(1)\)¶

For \(j \geq 1\),

\[\begin{split}\gamma_j & = E\left[(Y_t-\mu)(Y_{t-j}-\mu)\right] \qquad \qquad \qquad \qquad \enspace\end{split}\]

\[\begin{split}& = \phi E[(Y_{t-1}-\mu)(Y_{t-j} - \mu)] + E[\varepsilon_t (Y_{t-j} - \mu)]\end{split}\]

\[\begin{split}& \enspace = \phi \gamma_{j-1} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\end{split}\]

\[\begin{split}& \vdots \qquad \qquad \qquad \qquad \qquad \qquad \qquad\end{split}\]

\[\begin{split}& \enspace = \phi^j \gamma_0. \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\end{split}\]

Autocorrelations of \(AR(1)\)¶

The autocorrelations of an \(AR(1)\) are

\[\begin{split}\rho_j & = \frac{\gamma_j}{\gamma_0} = \phi^j, \,\,\,\, \forall j \geq 0.\end{split}\]

Since we assumed \(|\phi| < 1\), the autocorrelations decay exponentially as \(j\) increases.

Note that if \(\phi \in (-1,0)\), the autocorrelations decay in an oscillatory fashion.

Examples of \(AR(1)\) Processes¶

\(AR(1)\) Autocorrelations¶

Random Walk¶

Suppose \(\phi = 1\):

\[Y_t = c + Y_{t-1} + \varepsilon_t = \cdots = tc + Y_0 + \varepsilon_1 + \cdots \varepsilon_t.\]

Clearly \(E[Y_t] = tc + Y_0\), which is not independent of time.

\(Var(Y_t) = t\sigma^2\), which increases linearly with time.

Explosive \(AR(1)\)¶

When \(|\phi| > 1\), the autoregressive process is explosive.

Recall that \(Y_t = \frac{c}{1-\phi} + \sum_{i=0}^{\infty} \phi^i \varepsilon_{t-i}\).

Now \(|\phi^i|\) increases with \(i\) rather than decay.

Past values of \(\varepsilon_{t-i}\) contribute greater amounts to \(Y_t\) as \(i\) increases.

Examples of \(AR(1)\) Processes¶

\(AR(1)\) Autocorrelations¶

\(AR(p)\) Process¶

Given white noise \(\{\varepsilon_t\}\), consider the process

\[\begin{split}Y_t & = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \ldots + \phi_p Y_{t-p} + \varepsilon_t,\end{split}\]

where \(c\) and \(\{\phi\}_{i=1}^p\) are constants.

This is a \(p\) th-order autoregressive or \(AR(p)\) process.

Expectation of \(AR(p)\)¶

Assume \(Y_t\) is weakly stationary.

\[\begin{split}E[Y_t] & = c + \phi_1 E[Y_{t-1}] + \ldots + \phi_p E[Y_{t-p}] + E[\varepsilon_t]\end{split}\]

\[\begin{split}& = c + \phi_1 E[Y_t] + \ldots + \phi_p E[Y_t] \qquad \enspace\end{split}\]

\[\begin{split}\Rightarrow E[Y_t] & = \frac{c}{1-\phi_1 - \ldots - \phi_p} = \mu.\end{split}\]

Autocovariances of \(AR(p)\)¶

Given that \(\mu = c/(1-\phi_1 - \ldots - \phi_p)\) for weakly stationary \(Y_t\):

\[Y_t = \mu(1-\phi_1 - \ldots - \phi_p) + \phi_1 Y_{t-1} + \ldots + \phi_p Y_{t-p} + \varepsilon_t\]

\[\Rightarrow (Y_t - \mu) = \phi_1(Y_{t-1} - \mu) + \ldots + \phi_p(Y_{t-p} - \mu) + \varepsilon_t.\]

Autocovariances of \(AR(p)\)¶

Thus,

\[\begin{split}\gamma_j & = E\left[(Y_t - \mu) (Y_{t-j} - \mu)\right] \qquad \qquad \qquad \qquad \qquad \enspace\end{split}\]

\[\begin{split}& = \phi_1 E\left[(Y_{t-1} - \mu) (Y_{t-j} - \mu)\right] + \ldots \qquad \qquad \qquad\end{split}\]

\[\begin{split}& \hspace{0.75in} + \phi_p E\left[(Y_{t-p} - \mu) (Y_{t-j} - \mu)\right] + E\left[\varepsilon_t (Y_{t-j} - \mu)\right]\end{split}\]

(1)\[\begin{split}& = \begin{cases} \phi_1 \gamma_{j-1} + \ldots + \phi_p \gamma_{j-p} & \text{ for } j = 1, \ldots \\ \phi_1 \gamma_1 + \ldots + \phi_p \gamma_p + \sigma^2 & \text{ for } j = 0. \end{cases} \enspace\end{split}\]

Autocovariances of \(AR(p)\)¶

For \(j = 0, 1, \ldots, p\), System (1) is a system of \(p+1\) equations with \(p+1\) unknowns: \(\{\gamma_j\}_{j=0}^p\).

\(\{\gamma_j\}_{j=0}^p\) can be solved for as functions of \(\{\phi_j\}_{j=1}^p\) and \(\sigma^2\).

It can be shown that \(\{\gamma_j\}_{j=0}^p\) are the first \(p\) elements of the first column of \(\sigma^2 [I_{p^2} - \Phi \otimes \Phi]^{-1}\), where \(\otimes\) denotes the Kronecker product.

\(\{\gamma_j\}_{j=p+1}^{\infty}\) can then be determined using prior values of \(\gamma_j\) and \(\{\phi_j\}_{j=1}^p\).

Autocorrelations of \(AR(p)\)¶

Dividing the autocovariances by \(\gamma_0\),

\[\begin{split}\rho_j & = \phi_1 \rho_{j-1} + \ldots + \phi_p \rho_{j-p} \,\,\,\,\,\, \text{ for } j = 1, 2, 3, \ldots\end{split}\]

Estimating \(AR\) Models¶

Ideally, estimation of an \(AR\) model is done via maximum likelihood.

For an \(AR(p)\) model, one would first specify a joint likelihood for the parameters \({\phi_1, \ldots, \phi_p, c, \sigma^2}\).

Taking derivatives of the log likelihood with respect to each of the parameters would result in a system of equations that could be solved for the MLEs: \({\hat{\phi}_1, \ldots, \hat{\phi}_p, \hat{c}, \hat{\sigma}^2}\).

Estimating \(AR\) Models¶

The exact likelihood is a bit cumbersome and maximization requires specialized numerical methods.

It turns out that the least squares estimates obtained by fitting a regression of \(Y_t\) on \(Y_{t-1}, \ldots, Y_{t-p}\) are almost identical to the MLEs (they are called the conditional MLEs).

Estimating \(AR\) Models¶

The exact MLEs can be obtained with the \(\mathtt{arima}\) function in \(\mathtt{R}\).

The conditional (least squares) MLEs can be obtained with the \(\mathtt{lm}\) function in \(\mathtt{R}\).

Which \(AR\)?¶

How do we know if an \(AR\) model is appropriate and which \(AR\) model to fit?

After fitting an \(AR\) model, we can examine the residuals.

The \(\mathtt{acf}\) function in \(\mathtt{R}\) can be used to compute empirical autocorrelations of the residuals.

If the residuals are autocorrelated, the model is not a good fit. Consider increasing the order of the \(AR\) or using another model.

Which \(AR\)?¶

Suppose \(Y_t\) is an \(AR(2)\) process:

\[\begin{split}(Y_t - \mu) & = \phi_1 (Y_{t-1} - \mu) + \phi_2 (Y_{t-2} - \mu) + \varepsilon_t.\end{split}\]

If we estimate an \(AR(1)\) model using the data, then for large sample sizes \(\hat{\mu} \approx \mu\) and \(\hat{\phi} \approx E[\hat{\phi}] = \phi^* \neq \phi_1\).

Which \(AR\)?¶

The resulting residuals would be

\[\begin{split}\hat{\varepsilon}_t & = (Y_t - \mu) - \phi^*(Y_{t-1} - \mu) \qquad \qquad \qquad \qquad \qquad\end{split}\]

\[\begin{split}& = \phi_1(Y_{t-1} - \mu) + \phi_2(Y_{t-2} - \mu) + \varepsilon_t - \phi^*(Y_{t-1} - \mu)\end{split}\]

\[\begin{split}& \quad \enspace \, = (\phi_1 - \phi^*)(Y_{t-1} - \mu) + \phi_2(Y_{t-2} - \mu) + \varepsilon_t - \phi^*(Y_{t-1} - \mu).\end{split}\]

Even if \(\phi^* = \phi_1\), the residuals will exhibit autocorrelation, due to the presence of \(Y_{t-2}\).

Which \(AR\)?¶

The \(\mathtt{auto.arima}\) function in \(\mathtt{R}\) estimates a range of \(AR(p)\) models and selects the one with the best fit.

\(\mathtt{auto.arima}\) uses the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to select the model.

Minimizing AIC and BIC amounts to minimizing the sum of squared residuals, with a penalty term that is related to the number of model parameters.