Generalized Method of Moments

Setup

Let \(\smash{\boldsymbol{Y}_t}\) be an \(\smash{(n \times 1)}\) vector of random variables and \(\smash{\boldsymbol{\theta}}\) a \(\smash{(k \times 1)}\) vector of parameters governing the process \(\smash{\{\boldsymbol{Y}_{t}\}}\).

  • Denote the true parameter vector as \(\smash{\boldsymbol{\theta}_{0}}\).

Moment Conditions

Suppose we can specify an \(\smash{(r \times 1)}\) vector valued function \(\smash{\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{Y}_{t}): (\mathbb{R}^{k} \times \mathbb{R}^{n}) \rightarrow \mathbb{R}^{r}}\) such that:

\[\smash{E[\boldsymbol{h}(\boldsymbol{\theta}_{0},\boldsymbol{Y}_{t})] = 0, \,\,\, \text{where} \,\,\, r \ge k}.\]

Define \(\smash{\boldsymbol{\mathcal{Y}}_t = (\boldsymbol{y}_1, \ldots, \boldsymbol{y}_t)}\) and

\[\smash{\boldsymbol{g}_{T}(\boldsymbol{\theta}| \boldsymbol{\mathcal{Y}}_T) = \frac{1}{T} \sum_{t=1}^{T} \boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{y}_{t})}.\]

Note that \(\smash{\boldsymbol{g}_{T}(\boldsymbol{\theta}| \boldsymbol{\mathcal{Y}}_T): \mathbb{R}^{k} \rightarrow \mathbb{R}^{r}}\).

GMM Estimator

We want to choose \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) such that the sample moments \(\smash{\boldsymbol{g}_{T}(\hat{\boldsymbol{\theta}}_{gmm}| \boldsymbol{\mathcal{Y}}_T)}\) are close to zero.

  • If \(\smash{r = k}\), we can choose \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) such that \(\smash{\boldsymbol{g}_{T}(\hat{\boldsymbol{\theta}}_{gmm}| \boldsymbol{\mathcal{Y}}_T) = 0}\) because we have \(\smash{k}\) equations and \(\smash{k}\) unknowns.
  • If \(\smash{r > k}\) we have more equations than unknowns; in general there is no \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) such that \(\smash{\boldsymbol{g}_{T}(\hat{\boldsymbol{\theta}}_{gmm}| \boldsymbol{\mathcal{Y}}_T) = 0}\).

GMM Estimator

If \(\smash{r > k}\), we minimize a quadratic form:

\[\smash{\underset{1 \times 1}{\underbrace{Q_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)}} = \underset{(1 \times r)}{\underbrace{\boldsymbol{g}_{T}(\boldsymbol{\theta}| \boldsymbol{\mathcal{Y}}_T)'}}\underset{(r \times r)}{\underbrace{W_{T}}}\underset{(r \times 1)}{\underbrace{\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)}}}.\]
  • The matrix \(\smash{W_{T}}\) places more weight on some moment conditions and less on others.
  • We might have to use numerical optimization to minimze \(\smash{Q_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)}\).

Example: t-distribution

The method of moments estimator of the t-distribution is a special case of the GMM estimator.

  • \(\smash{\boldsymbol{Y}_{t}} = Y_t\).
  • \(\smash{\boldsymbol{\theta} = \nu}\)
  • \(\smash{W_{T} = 1}\)
  • \(\smash{\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{Y}_{t}) = Y_{t}^{2} - \frac{\nu}{\nu - 2}}\).

Note that

\[\smash{E[Y_{t}^{2}] = \frac{\nu}{\nu-2}}.\]

Example: t-distribution

\[\begin{split}\begin{align} E[\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{Y}_{t})] & = E\left[Y_{t}^{2} - \frac{\nu}{\nu-2}\right] = 0 \\ \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T) & = \frac{1}{T} \sum_{t=1}^{T} \left(y_{t}^{2} - \frac{\nu}{\nu - 2}\right). \end{align}\end{split}\]

In this case, \(\smash{r = k = 1}\), and

\[\smash{Q_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T) = \left[ \frac{1}{T} \sum_{t=1}^{T} \left(y_{t}^{2} - \frac{\nu}{\nu - 2}\right)\right]^{2}}.\]

Since \(\smash{r = k = 1}\), \(\smash{\hat{\nu}_{gmm}}\) can be chosen such that \(\smash{Q_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T) = 0}\).

Example: t-distribution with \(\smash{r = 2}\)

Suppose we add a moment condition for the t-distribution.

  • If \(\smash{\nu > 4}\), then
\[\smash{\mu_{4} = E[Y_{t}^{4}] = \frac{3\nu^{2}}{(\nu-2)(\nu-4)}}.\]
  • In this case, \(\smash{ r = 2 > 1 = k}\).
  • We now have more moment conditions than parameters.

Example: t-distribution with \(\smash{r = 2}\)

We map this problem into GMM form in the following way:

  • \(\smash{\boldsymbol{Y}_{t} = Y_t}\)
  • \(\smash{\boldsymbol{\theta} = \nu}\)
\[\begin{split}\begin{align} W_{T} & = \left[\begin{array}{cc} 1 & 0 \\ 0 & 1 \\ \end{array} \right] \\ \boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{Y}_{t}) & = \left[\begin{array}{c} Y_{t}^{2} - \frac{\nu}{\nu - 2} \\ Y_{t}^{4} - \frac{3\nu^{2}}{(\nu-2)(\nu-4)} \\ \end{array} \right] \\ \boldsymbol{g}_{T}(\boldsymbol{\theta} | \boldsymbol{\mathcal{Y}}_T) & = \frac{1}{T} \sum_{t=1}^{T} h(\boldsymbol{\theta},\boldsymbol{y}_{t}). \end{align}\end{split}\]

Example: t-distribution with \(\smash{r = 2}\)

The weighting matrix \(\smash{W_{T} = I_2}\) places equal weight on the two moment conditions.

  • We could alter this matrix to emphasize one condition more than another.

GMM Consistency

If \(\smash{\boldsymbol{Y}_{t}}\) is strictly stationary and \(\smash{\boldsymbol{h}}\) continuous, a law of large numbers will hold:

\[\smash{\boldsymbol{g}_{T}(\boldsymbol{\theta}) \overset{p}{\rightarrow} E[\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{Y}_{t})]}.\]

Under certain regularity conditions, it can be shown that

\[\smash{\boldsymbol{\theta}_{gmm} \overset{p}{\rightarrow} \boldsymbol{\theta}_{0}}.\]