GMM Optimal Weighting Matrix
Moment Conditions’ Covariance
Suppose
\(\smash{\{\boldsymbol{h}(\boldsymbol{\theta},
\boldsymbol{Y}_{t})\}_{t=1}^{T}}\) is strictly stationary and
define
\[\smash{\Gamma_{\nu} =
E[\boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{Y}_{t})
\boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{Y}_{t-\nu})^{'}]}\]
and
\[\begin{align}
S = \sum_{\nu=-\infty}^{\infty} \Gamma_{\nu} = \Gamma_{0} +
\sum_{\nu=1}^{\infty} (\Gamma_{\nu} + \Gamma_{\nu}^{'}).
\end{align}\]
Convergence in Distribution
Asymptotic theory dictates
\[\smash{\sqrt{T}(\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T}) -
E[\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{y}_{t})])
\overset{d}{\rightarrow} N(0,S)}\]
where
\[\smash{ \sum_{t=1}^{T} \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T})
\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T})^{'}
\overset{p}{\rightarrow} S}.\]
Optimal Weighting Matrix
Another way to say this (intuitively):
\[\smash{\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)
\overset{approx}{\sim}
N\left(E[\boldsymbol{h}(\boldsymbol{\theta},\boldsymbol{y}_{t})],
\frac{S}{T}\right)}\]
The optimal GMM weighting matrix is \(\smash{S^{-1}}\):
\[\smash{Q_{T}(\boldsymbol{\theta}) =
\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)^{'}S^{-1}
\boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}}_T)}.\]
Optimal Matrix Estimation
If \(\smash{\{\boldsymbol{h}(\boldsymbol{\theta_{0}},
\boldsymbol{y}_{t})\}_{t=-\infty}^{\infty}}\) is serially uncorrelated,
\(\smash{S}\) is consistently estimated by
\[\smash{S_{T}^{*} = \frac{1}{T} \sum_{t=1}^{T}
\boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t})
\boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t})^{'}}.\]
If it is serially correlated,
\[\smash{S_{T}^{*} = \Gamma_{0,T}^{*} + \sum_{\nu =1}^{q}\left(1 -
\frac{\nu}{q+1}\right)\left(\Gamma_{\nu,T}^{*} +
\Gamma_{\nu,T}^{* \hspace{3pt}'}\right)},\]
where
\[\smash{\Gamma_{\nu,T}^{*} = \frac{1}{T} \sum_{t = \nu+1}^{T}
\boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t})
\boldsymbol{h}(\boldsymbol{\theta_{0}},\boldsymbol{y}_{t-\nu})^{'}}.\]
Optimal Matrix Estimation
Notice that \(\smash{S^{*}}\) depends on
\(\smash{\boldsymbol{\theta}_{0}}\), which is unknown.
- We substitute an estimate \(\smash{\hat{\boldsymbol{\theta}}}\)
for \(\smash{\boldsymbol{\theta}_{0}}\) in \(\smash{S^{*}}\)
and denote the estimated value as \(\smash{\hat{S}}\).
- \(\smash{\hat{S}}\) may make use of appropriate definitions of
\(\smash{\hat{\Gamma}_{\nu,T}}\) if there is serial
correlation.
Under certain regularity conditions
\[\smash{\hat{S} \overset{p}{\rightarrow} S}.\]
Two Stage Estimation
Note that we want to use \(\smash{\hat{S}^{-1}}\) as the optimal
weighting matrix to compute \(\smash{\hat{\boldsymbol{\theta}}}\),
but that \(\smash{\hat{S}^{-1}}\) depends on
\(\smash{\hat{\boldsymbol{\theta}}}\).
- To compute optimal \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\),
first estimate \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) with
\(\smash{W_T = I_{r}}\).
- Use the initial \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) to
compute \(\smash{\hat{S}_{T}(\hat{\boldsymbol{\theta}}_{gmm})}\)
and set \(\smash{W_T =
\hat{S}_{T}(\hat{\boldsymbol{\theta}}_{gmm})^{-1}}\).
- Compute \(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) again.
Two Stage Estimation
How is the two-stage procedure better?
- That is, why is \(\smash{S^{-1}}\) optimal?
- Using \(\smash{S^{-1}}\) or a consistent estimate,
\(\smash{\hat{S}^{-1}}\), results in
\(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\) with less
estimation error.
Asymptotic Distribution of GMM Estimator
A central limit theorem exists for
\(\smash{\hat{\boldsymbol{\theta}}_{gmm}}\):
\[\smash{\sqrt{T} (\hat{\boldsymbol{\theta}}_{gmm} -
\boldsymbol{\theta}_{0}) \overset{d}{\rightarrow} N(0,V)},\]
where
\[\begin{split}\begin{gather}
V = (DS^{-1}D^{'})^{-1} \\
\frac{\partial \boldsymbol{g}_{T}(\boldsymbol{\theta}|\boldsymbol{\mathcal{Y}_T})}
{\partial\boldsymbol{\theta}} \bigg|_{\boldsymbol{\theta} =
\boldsymbol{\theta}_{0}} \overset{p}{\longrightarrow} D.
\end{gather}\end{split}\]
Asymptotic Distribution of GMM Estimator
That is, for large \(\smash{T}\),
\[\smash{\hat{\boldsymbol{\theta}}_{gmm} \overset{approx}{\sim}
N(\boldsymbol{\theta}_{0}, \frac{\hat{V}_{T}}{T})},\]
where,
\[\begin{split}\begin{align}
\hat{V}_{T} & = (\hat{D}_{T}\hat{S}_{T}^{-1}\hat{D}_{T}^{'})^{-1}
\\
\hat{D}_{T} & = \frac{\partial
\boldsymbol{g}_{T}(\boldsymbol{\theta}|
\boldsymbol{\mathcal{Y}_T})}{\partial
\boldsymbol{\theta}}\bigg|_{\boldsymbol{\theta} =
\hat{\boldsymbol{\theta}}}.
\end{align}\end{split}\]