Resampling

Properties of Estimators

We’ve now seen that estimation is not enough.

  • We often want to know about properties of estimators.
  • For example, what is the standard error of an estimator?
  • Remember that before data is observed, an estimator is a random variable itself.
  • As an example, \(\bar{Y}\) is a sum of random variables divided by a constant value.
  • Before \(\{Y_i\}_{i=1}^n\) is observed, \(\bar{Y}\) is random and has its own variance and standard deviation.

Resampling

It is often challenging or impossible to compute certain characteristics of estimators.

  • We would like to replace theoretical calculations with Monte Carlo simulation, which draws additional samples from the population.
  • Sampling from the true population is typically impossible.

Resampling

We substitute sampling from the true population with sampling from the observed sample.

  • This is referred to as resampling.
  • If the sample is a good representation of the true population, then sampling from the sample should approximate sampling from the population.

Bootstrapping

Suppose the original sample has \(n\) data observations.

  • Bootstrapping involves drawing \(B\) new samples of size \(n\) from the original sample.
  • Each bootstrap sample is done with replacement.
  • Otherwise, the bootstrap samples would all be identical to the original sample (why?).
  • Drawing with replacement allows each bootstrap observation to be drawn in an \(i.i.d.\) fashion from the sample.
  • So, the original sample plays the role of the population.

Bootstrap Estimates

Let \(\theta\) be a parameter of interest and let \(\hat{\theta}\) denote an estimate of \(\theta\) using a sample of data, \(\{y_i\}_{i=1}^n\).

  • \(\hat{\theta}\) might be calculated by maximum likelihood estimation.
  • We could create \(B\) new samples from \(\{y_i\}_{i=1}^n\) by resampling with replacement.
  • For each new sample \(j = 1, \ldots, B\), we could compute \(\hat{\theta}^*_j\) in the exact way \(\hat{\theta}\) was computed with \(\{y_i\}_{i=1}^n\).

Bootstrap Estimates

  • One way to estimate \(E[\hat{\theta}]\) is by averaging the bootstrap estimates:
\[E[\hat{\theta}] \approx \bar{\hat{\theta}}^* = \frac{1}{B} \sum_{j=1}^B \hat{\theta}^*_j.\]

Estimating Bias

True bias for an estimator is defined as

\[\text{BIAS}(\hat{\theta}) = E[\hat{\theta}] - \theta.\]
  • We can approximate the population average, \(E[\hat{\theta}]\), with a bootstrap average, \(\bar{\hat{\theta}}^*\):
\[\text{BIAS}_{\text{boot}}(\hat{\theta}) = \bar{\hat{\theta}}^* - \hat{\theta}.\]
  • We replaced the true population value, \(\theta\), with the sample value, \(\hat{\theta}\), since the sample substitutes for the population.

Estimating Standard Error

The true standard deviation of \(\hat{\theta}\) can be estimated with the bootstrap estimates:

\[s_{\text{boot}}(\hat{\theta}) = \sqrt{\frac{1}{B-1} \sum_{j=1}^B \left(\hat{\theta}^*_j - \bar{\hat{\theta}}^* \right)^2}.\]

Example: Pareto Distribution

Suppose we have a sample of random variables drawn from a Pareto distribution:

\[Y_i \stackrel{i.i.d.}{\sim} \mathcal{P}(\alpha, \beta), \qquad i=1,\ldots, n.\]
  • The density of each \(Y_i\) is
\[f(y|\alpha, \beta) = \frac{\beta \alpha^\beta}{y^{\beta+1}}.\]
  • If \(Y_i \sim \mathcal{P}(\alpha, \beta)\), then \(\alpha > 0\), \(\beta > 0\) and \(Y_i > \alpha\).
  • \(\alpha\) is a parameter dictating the minimum possible value of \(Y_i\) and \(\beta\) is a shape parameter.

Example: Pareto Distribution

In the graph below, \(\beta = \kappa\).

_images/paretoPDF.png

Example: Pareto Distribution

The joint density of \({\bf Y} = (Y_1, \ldots, Y_n)'\) is

\[f_{{\bf Y}}({\bf y}|\alpha, \beta) = \prod_{i=1}^n f_{Y_i}(y_i|\alpha, \beta) \qquad \; \,\]
\[= \prod_{i=1}^n \frac{\beta \alpha^\beta}{y_i^{\beta+1}}\]
\[\quad \; \; = \frac{\beta^n \alpha^{n\beta}}{\prod_{i=1}^n y_i^{\beta+1}}.\]

Example: Pareto Distribution

Assuming \(\alpha\) is known, the log likelihood of \(\beta\) is

\[\ell (\beta|\alpha, {\bf y}) = \log(f_{{\bf Y}}({\bf y}|\alpha, \beta)) \qquad \qquad \qquad \qquad \qquad \qquad\]
\[\quad \, = n \log(\beta) + n\beta \log(\alpha) - (1+\beta) \sum_{i=1}^n \log(y_i).\]

Example: Pareto Distribution

The MLE, \(\hat{\beta}\) is the value such that

\[\frac{\partial \ell}{\partial \beta}\bigg|_{\beta = \hat{\beta}} = \frac{n}{\hat{\beta}} + n \log(\alpha) - \sum_{i=1}^n \log(y_i) = 0.\]
\[\Rightarrow \hat{\beta} = \frac{n}{\sum_{i=1}^n \log(y_i) - n\log(\alpha)}.\]

Example: Pareto Distribution

The second derivative of the log likelihood is

\[\begin{split}\frac{\partial^2 \ell}{\partial \beta^2} & = -\frac{n}{\beta^2}.\end{split}\]
  • The observed Fisher information is
\[\begin{split}\tilde{\mathcal{I}}(\hat{\beta}) & = -\frac{\partial^2 \ell}{\partial \beta^2} \bigg|_{\beta = \hat{\beta}} = \frac{n}{\hat{\beta}^2}.\end{split}\]
  • The asymptotic standard error of \(\hat{\beta}\) is
\[\begin{split}Std(\hat{\beta}) & \approx \sqrt{\tilde{\mathcal{I}}(\hat{\beta})^{-1}} = \frac{\hat{\beta}}{\sqrt{n}}.\end{split}\]

Example: Pareto Distribution

Given a sample of \(n\) observations from a Pareto distribution:

  • We can compute the MLE, \(\hat{\beta}\).
  • We can compute the asymptotic standard error \(\hat{\beta}/\sqrt{n}\).

Example: Pareto Distribution

We can generate \(B\) new samples by resampling.

  • For each new sample, we can compute \(\hat{\beta}_j\), \(j=1,\ldots,B\).
  • We can compute the standard deviation of \(\{\hat{\beta}_j\}_{j=1}^B\) and compare to the asymptotic standard error.
  • The bootstrap standard error will be a better estimate of variation than the asymptotic standard error when \(n\) is small.

Bootstrap Confidence Intervals

Given a set of bootstrap estimates, \(\{\hat{\theta}^*_j\}_{j=1}^B\), we can form a \(1-\alpha\) confidence interval with the normal approximation

\[\left(\hat{\theta} - s_{\text{boot}}(\hat{\theta}) \, z_{\alpha/2}, \,\, \hat{\theta} + s_{\text{boot}}(\hat{\theta}) \, z_{\alpha/2}\right)\]

where \(z_{\alpha/2}\) is the \(\alpha\) -upper quantile of the standard normal distribution.

  • Note that the interval is centered around \(\hat{\theta}\) rather than \(\theta\).
  • In this case \(\hat{\theta}\) is substituted for \(\theta\), just as the data sample is substituted for the true population.

Bootstrap Confidence Intervals

Alternatively, we could compute the \(\alpha\) and \(1-\alpha\) empirical quantiles of the bootstrap estimates, \(\{\hat{\theta}^*_j\}_{j=1}^B\): \(q_{\alpha/2}\) and \(q_{(1-\alpha)/2}\).

  • The resulting \(1-\alpha\) confidence interval is
\[\left(q_{\alpha/2}, q_{(1-\alpha)/2}\right).\]