GMM Optimal Weighting Matrix¶

Moment Conditions’ Covariance¶

Suppose \(\smash{\{\boldsymbol{h}(\boldsymbol{\theta}, \boldsymbol{Y}_{t})\}_{t=1}^{T}}\) is strictly stationary and define

\[\smash{\Gamma_{\nu} = E[\boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{Y}_{t}) \boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{Y}_{t-\nu})^{'}]}\]

and

\[\begin{align} S = \sum_{\nu=-\infty}^{\infty} \Gamma_{\nu} = \Gamma_{0} + \sum_{\nu=1}^{\infty} (\Gamma_{\nu} + \Gamma_{\nu}^{'}). \end{align}\]

Convergence in Distribution¶

Asymptotic theory dictates

\[\smash{\sqrt{T}(\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T}) - E[\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{y}_{t})]) \overset{d}{\rightarrow} N(0,S)}\]

where

\[\smash{ \sum_{t=1}^{T} \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T}) \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T})^{'} \overset{p}{\rightarrow} S}.\]

Optimal Weighting Matrix¶

Another way to say this (intuitively):

\[\smash{\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T) \overset{approx}{\sim} N\left(E[\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{y}_{t})], \frac{S}{T}\right)}\]

The optimal GMM weighting matrix is \(\smash{S^{-1}}\):

\[\smash{Q_{T}(\boldsymbol{\theta}) = \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)^{'}S^{-1} \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)}.\]

Optimal Matrix Estimation¶

If \(\smash{\{\boldsymbol{h}(\boldsymbol{\theta_{0}}, \boldsymbol{y}_{t})\}_{t=-\infty}^{\infty}}\) is serially uncorrelated, \(\smash{S}\) is consistently estimated by

\[\smash{S_{T}^{*} = \frac{1}{T} \sum_{t=1}^{T} \boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t}) \boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t})^{'}}.\]

If it is serially correlated,

\[\smash{S_{T}^{*} = \Gamma_{0,T}^{*} + \sum_{\nu =1}^{q}\left(1 - \frac{\nu}{q+1}\right)\left(\Gamma_{\nu,T}^{*} + \Gamma_{\nu,T}^{* \hspace{3pt}'}\right)},\]

where

\[\smash{\Gamma_{\nu,T}^{*} = \frac{1}{T} \sum_{t = \nu+1}^{T} \boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t}) \boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t-\nu})^{'}}.\]

Optimal Matrix Estimation¶

Notice that \(\smash{S^{*}}\) depends on \(\smash{\boldsymbol{\theta}_{0}}\), which is unknown.

We substitute an estimate \(\smash{\hat{\boldsymbol{\theta}}}\) for \(\smash{\boldsymbol{\theta}_{0}}\) in \(\smash{S^{*}}\) and denote the estimated value as \(\smash{\hat{S}}\).

\(\smash{\hat{S}}\) may make use of appropriate definitions of \(\smash{\hat{\Gamma}_{\nu,T}}\) if there is serial correlation.

Under certain regularity conditions

\[\smash{\hat{S} \overset{p}{\rightarrow} S}.\]

Two Stage Estimation¶

Note that we want to use \(\smash{\hat{S}^{-1}}\) as the optimal weighting matrix to compute \(\smash{\hat{\boldsymbol{\theta}}}\), but that \(\smash{\hat{S}^{-1}}\) depends on \(\smash{\hat{\boldsymbol{\theta}}}\).

To compute optimal \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\), first estimate \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) with \(\smash{W_T = I_{r}}\).

Use the initial \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) to compute \(\smash{\hat{S}_{T}(\hat{\boldsymbol{\theta}}_{gmm})}\) and set \(\smash{W_T = \hat{S}_{T}(\hat{\boldsymbol{\theta}}_{gmm})^{-1}}\).

Compute \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) again.

Two Stage Estimation¶

How is the two-stage procedure better?

That is, why is \(\smash{S^{-1}}\) optimal?

Using \(\smash{S^{-1}}\) or a consistent estimate, \(\smash{\hat{S}^{-1}}\), results in \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) with less estimation error.

Asymptotic Distribution of GMM Estimator¶

A central limit theorem exists for \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\):

\[\smash{\sqrt{T} (\hat{\boldsymbol{\theta}}_{gmm} - \boldsymbol{\theta}_{0}) \overset{d}{\rightarrow} N(0,V)},\]

where

\[\begin{split}\begin{gather} V = (DS^{-1}D^{'})^{-1} \\ \frac{\partial \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T})} {\partial\boldsymbol{\theta}} \bigg|_{\boldsymbol{\theta} = \boldsymbol{\theta}_{0}} \overset{p}{\longrightarrow} D. \end{gather}\end{split}\]

Asymptotic Distribution of GMM Estimator¶

That is, for large \(\smash{T}\),

\[\smash{\hat{\boldsymbol{\theta}}_{gmm} \overset{approx}{\sim} N(\boldsymbol{\theta}_{0}, \frac{\hat{V}_{T}}{T})},\]

where,

\[\begin{split}\begin{align} \hat{V}_{T} & = (\hat{D}_{T}\hat{S}_{T}^{-1}\hat{D}_{T}^{'})^{-1} \\ \hat{D}_{T} & = \frac{\partial \boldsymbol{g}_{T}(\boldsymbol{\theta}| \boldsymbol{\mathcal{Y}_T})}{\partial \boldsymbol{\theta}}\bigg|_{\boldsymbol{\theta} = \hat{\boldsymbol{\theta}}}. \end{align}\end{split}\]