Stationarity¶

Introduction¶

Time series analysis is concerned with dynamics.

We may have complete knowledge of the unconditional distribution of a group of random variables but no understanding of their sequential dynamics.

Time series is focused on understanding the sequential relationship of a group of random variables.

Hence, the focus is conditional distributions and autocovariances.

Time Series¶

A time series is a stochastic process indexed by time:

\[\begin{align*} Y_1, Y_2, Y_3, \ldots, Y_{T-1}, Y_T. \end{align*}\]

Stochastic is a synonym for random.

So a time series is a sequence of (potentially different) random variables ordered by time.

We will let lower-case letters denote a realization of a time series.

\[\begin{align*} y_1, y_2, y_3, \ldots, y_{T-1}, y_T. \end{align*}\]

Distributions¶

We will think of \({\bf Y}_T = \{Y_t\}_{t=1}^T\) as a random variable in its own right.

\({\bf y}_T = \{y_t\}_{t=1}^T\) is a single realization of \({\bf Y}_T = \{Y_t\}_{t=1}^T\).

The CDF is \(F_{{\bf Y}_T}({\bf y}_T)\) and the PDF is \(f_{{\bf Y}_T}({\bf y}_T)\).

For example, consider \(\smash{T = 100}\):

\[\begin{align*} F\left({\bf y}_{100}\right) & = P(Y_1 \leq y_1, \ldots, Y_{100} \leq y_{100}). \end{align*}\]

Notice that \({\bf Y}_T\) is just a collection of random variables and \(f_{{\bf Y}_T}({\bf y}_T)\) is the joint density.

Time Series Observations¶

As statisticians and econometricians, we want many observations of \({\bf Y}_T\) to learn about its distribution:

\[\begin{align*} {\bf y}_T^{(1)}, \,\,\,\,\,\, {\bf y}_T^{(2)},& \,\,\,\,\,\, {\bf y}_T^{(3)}, \,\,\,\,\,\, \ldots \end{align*}\]

Likewise, if we are only interested in the marginal distribution of \(\smash{Y_{17}}\)

\[\begin{align*} F_{Y_{17}}(a) = P(Y_{17} \leq a) \end{align*}\]

we want many observations: \(\smash{\left\{y_{17}^{(i)}\right\}_{i=1}^N}\).

Time Series Observations¶

Unfortunately, we usually only have one observation of \({\bf Y}_T\).

Think of the daily closing price of Harley-Davidson stock since January 2nd.

Think of your cardiogram for the past 100 seconds.

In neither case can you repeat history to observe a new sequence of prices or electric heart signals.

In time series econometrics we typically base inference on a single observation.

Additional assumptions about the process will allow us to exploit information in the full sequence \({\bf y}_T\) to make inferences about the joint distribution \(F_{{\bf Y}_T}({\bf y}_T)\).

Moments¶

Since the stochastic process is comprised of individual random variables, we can consider moments of each:

\[\begin{split}\begin{align*} E[Y_t] & = \int_{-\infty}^{\infty} y_t f_{Y_t}(y_t) dy_t = \mu_t \\ Var(Y_t) & = \int_{-\infty}^{\infty} (y_t-\mu_t)^2 f_{Y_t}(y_t) dy_t = \gamma_{0t} \end{align*}\end{split}\]

Moments¶

\[\begin{split}\begin{align*} Cov(Y_t, Y_{t-j}) & = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (y_t-\mu_t)(y_{t-j}-\mu_{t-j}) \\ & \hspace{2in} \times \, f_{Y_t,Y_{t-j}}(y_t,y_{t-j}) dy_tdy_{t-j} = \gamma_{jt}, \end{align*}\end{split}\]

where \(\smash{f_{Y_t}}\) and \(\smash{f_{Y_t,Y_{t-j}}}\) are the marginal distributions of \(f_{{\bf Y}_T}\) obtained by integrating over the appropriate elements of \({\bf Y}_T\).

Autocovariance and Autocorrelation¶

\(\smash{\gamma_{jt}}\) is known as the \(\smash{j}\) th autocovariance of \(\smash{Y_t}\) since it is the covariance of \(\smash{Y_t}\) with its own lagged value.

The \(\smash{j}\) th autocorrelation of \(\smash{Y_t}\) is defined as

\[\begin{split}\begin{align*} \rho_{jt} & = Corr(Y_t, Y_{t-j}) \\ & = \frac{Cov(Y_t, Y_{t-j})}{\sqrt{Var(Y_t)} \sqrt{Var(Y_{t-j})}} \\ & = \frac{\gamma_{jt}}{\sqrt{\gamma_{0t}} \sqrt{\gamma_{0t-j}}}. \end{align*}\end{split}\]

Sample Moments¶

If we had \(N\) observations \({\bf y}_T^{(1)},\ldots,{\bf y}_T^{(N)}\), we could estimate moments of each (univariate) \(\smash{Y_t}\) in the usual way:

\[\begin{split}\begin{align*} \hat{\mu}_t & = \frac{1}{N} \sum_{i=1}^N y_t^{(i)}. \\ \hat{\gamma}_{0t} & = \frac{1}{N} \sum_{i=1}^N (y_t^{(i)} - \hat{\mu}_t)^2. \\ \hat{\gamma}_{jt} & = \frac{1}{N} \sum_{i=1}^N (y_t^{(i)} - \hat{\mu}_t) (y_{t-j}^{(i)} - \hat{\mu}_{t-j}). \\ \end{align*}\end{split}\]

Example¶

Suppose each element of \({\bf Y}_T\) is described by

\[\begin{align*} Y_t & = \mu_t + \varepsilon_t, \,\,\,\, \varepsilon_t \stackrel{i.i.d.}{\sim} \mathcal{N}(0,\sigma^2_t), \forall t. \end{align*}\]

Example¶

In this case,

\[\begin{split}\begin{align*} \mu_t & = E[Y_t] = \mu_t, \,\,\, \forall t, \\ \gamma_{0t} & = Var(Y_t) = Var(\varepsilon_t) = \sigma^2_t, \,\,\, \forall t \\ \gamma_{jt} & = Cov(Y_t, Y_{t-j}) = Cov(\varepsilon_t, \varepsilon_{t-j}) = 0, \,\,\, \forall t, j \neq 0. \end{align*}\end{split}\]

If \(\smash{\sigma^2_t = \sigma^2}\) \(\smash{\forall t}\), \({\bf \varepsilon}_T\) is known as a Gaussian white noise process.

In this case, \({\bf Y}_T\) is a Gaussian white noise process with drift.

\({\bf \mu}_T\) is the drift vector.

White Noise¶

Generally speaking, \({\bf \varepsilon}_T\) is a white noise process if

\[\begin{split}\begin{gather*} E[\varepsilon_t] = 0, \,\,\, \forall t \\ E[\varepsilon^2_t] = \sigma^2, \,\,\, \forall t \\ E[\varepsilon_t \varepsilon_{\tau}] = 0, \,\,\, \text{ for } t \neq \tau. \end{gather*}\end{split}\]

White Noise¶

Notice there is no distributional assumption for \(\varepsilon_t\).

If \(\smash{\varepsilon_t}\) and \(\smash{\varepsilon_{\tau}}\) are independent for \(\smash{t \neq \tau}\), \({\bf \varepsilon}_T\) is independent white noise.

Notice that independence \(\smash{\Rightarrow E[\varepsilon_t \varepsilon_{\tau}] = 0}\), but \(E[\varepsilon_t \varepsilon_{\tau}] = 0 \not \Rightarrow\) independence.

If \(\smash{\varepsilon_t \sim \mathcal{N}(0, \sigma^2)}\) \(\forall t\), as in the example above, \({\bf \varepsilon}_T\) is Gaussian white noise.

Weak Stationarity¶

Suppose the first and second moments of a stochastic process \({\bf Y}_{T}\) don’t depend on \(\smash{t \in T}\):

\[\begin{split}\begin{align*} E[Y_t] & = \mu \,\,\,\, \forall t \\ Cov(Y_t, Y_{t-j}) & = \gamma_j \,\,\,\, \forall t \text{ and any } j. \end{align*}\end{split}\]

In this case \({\bf Y}_{T}\) is weakly stationary or covariance stationary.

In the previous example, if \(\smash{Y_t = \mu + \varepsilon_t}\) \(\smash{\forall t}\), \({\bf Y}_{T}\) is weakly stationary.

However if \(\smash{\mu_t \neq \mu}\) \(\smash{\forall t}\), \({\bf Y}_{T}\) is not weakly stationary.

Autocorrelation under Weak Stationarity¶

If \({\bf Y}_{T}\) is weakly stationary

\[\begin{split}\begin{align*} \rho_{jt} & = \frac{\gamma_{jt}}{\sqrt{\gamma_{0t}} \sqrt{\gamma_{0t-j}}} \\ & = \frac{\gamma_j}{\sqrt{\gamma_0} \sqrt{\gamma_0}} \\ & = \frac{\gamma_j}{\gamma_0} \\ & = \rho_j. \end{align*}\end{split}\]

Note that \(\smash{\rho_0 = 1}\).

Weak Stationarity¶

Under weak stationarity, autocovariances \(\smash{\gamma_j}\) only depend on the distance between random variables within a stochastic process:

\[\begin{align*} Cov(Y_{\tau}, Y_{\tau-j}) = Cov(Y_t, Y_{t-j}) = \gamma_j. \end{align*}\]

This implies

\[\begin{align*} \gamma_{-j} = Cov(Y_{t+j}, Y_t) = Cov(Y_t, Y_{t-j}) = \gamma_j. \end{align*}\]

Weak Stationarity¶

More generally,

\[\begin{split}\begin{align*} \Sigma_{{\bf Y}_T} & = \left[\begin{array}{ccccc} \gamma_0 & \gamma_1 & \cdots & \gamma_{T-2} & \gamma_{T-1} \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{T-3} & \gamma_{T-2} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ \gamma_{T-2} & \gamma_{T-3} & \cdots & \gamma_0 & \gamma_1 \\ \gamma_{T-1} & \gamma_{T-2} & \cdots & \gamma_1 & \gamma_0 \end{array}\right]. \end{align*}\end{split}\]

Strict Stationarity¶

\({\bf Y}_{T}\) is strictly stationary if for any set \(\smash{\{j_1, j_2, \ldots, j_n\} \in T}\)

\[\begin{align*} f_{Y_{j_1},\ldots,Y_{j_n}}(a_1, \ldots, a_n) = f_{Y_{j_1 + \tau},\ldots,Y_{j_n + \tau}}(a_1, \ldots, a_n), \,\,\, \forall \tau. \end{align*}\]

Strict stationarity means that the joint distribution of any subset of random variables in \({\bf Y}_{T}\) is invariant to shifts in time, \(\smash{\tau}\).

Strict stationarity \(\smash{\Rightarrow}\) weak stationarity if the first and second moments of a stochastic process exist.

Weak stationarity \(\smash{\not \Rightarrow}\) strict stationarity: invariance of first and second moments to time shifts (weak stationarity) does not mean that all higher moments are invariant to time shifts (strict stationarity).

Strict Stationarity¶

If \({\bf Y}_{T}\) is Gaussian then weak stationarity \(\smash{\Rightarrow}\) strict stationarity.

If \({\bf Y}_{T}\) is Gaussian, all marginal distributions of \(\smash{(Y_{j_1}, \ldots, Y_{j_n})}\) are also Gaussian.

Gaussian distributions are fully characterized by their first and second moments.

Ergodicity¶

Given \(\smash{N}\) identically distributed weakly stationary stochastic processes \(\left\{{\bf Y}_{T}\right\}_{i=1}^N\), the ensemble average is

\[\begin{align*} \frac{1}{N} \sum_{i=1}^N Y_t^{(i)} \stackrel{p}{\to} \mu, \,\,\,\, \forall t. \end{align*}\]

For a single stochastic process, we desire conditions under which the time average

\[\begin{align*} \frac{1}{T} & \sum_{t=1}^T Y_t \stackrel{p}{\to} \mu. \end{align*}\]

Ergodicity¶

If \({\bf Y}_{T}\) is weakly stationary and

\[\begin{align*} \sum_{j=0}^{\infty} & |\gamma_j| < \infty, \end{align*}\]

Then \({\bf Y}_{T}\) is ergodic for the mean and the time average converges.

The equation above requires that the autocovariances fall to zero sufficiently quickly.

i.e. a long realization of \(\smash{\{y_t\}}\) will have many segments that are uncorrelated and which can be used to approximate an ensemble average.

Ergodicity¶

A weakly stationary process is ergodic for the second moments if

\[\begin{align*} \frac{1}{T-j} \sum_{t=j+1}^T & (Y_t - \mu)(Y_{t-j} - \mu) \stackrel{p}{\to} \gamma_j. \end{align*}\]

Separate conditions exist which cause the equation above to hold.

If \({\bf Y}_{T}\) is Gaussian and stationary, then \(\smash{\sum_{j=0}^{\infty} |\gamma_j| < \infty}\) ensures that \(\smash{{\bf Y}_{T}}\) is ergodic for all moments.