.. slideconf:: :slide_classes: appear ============================================================================== Resampling ============================================================================== Properties of Estimators ============================================================================== We've now seen that estimation is not enough. .. rst-class:: to-build - We often want to know about properties of estimators. .. rst-class:: to-build - For example, what is the standard error of an estimator? .. rst-class:: to-build - Remember that before data is observed, an estimator is a random variable itself. .. rst-class:: to-build - As an example, :math:`\bar{Y}` is a sum of random variables divided by a constant value. .. rst-class:: to-build - Before :math:`\{Y_i\}_{i=1}^n` is observed, :math:`\bar{Y}` is random and has its own variance and standard deviation. Resampling ============================================================================== It is often challenging or impossible to compute certain characteristics of estimators. .. rst-class:: to-build - We would like to replace theoretical calculations with Monte Carlo simulation, which draws additional samples from the population. .. rst-class:: to-build - Sampling from the true population is typically impossible. Resampling ============================================================================== We substitute sampling from the true population with sampling from the observed sample. .. rst-class:: to-build - This is referred to as *resampling*. .. rst-class:: to-build - If the sample is a good representation of the true population, then sampling from the sample should approximate sampling from the population. Bootstrapping ============================================================================== Suppose the original sample has :math:`n` data observations. .. rst-class:: to-build - *Bootstrapping* involves drawing :math:`B` new samples of size :math:`n` from the original sample. .. rst-class:: to-build - Each bootstrap sample is done with replacement. .. rst-class:: to-build - Otherwise, the bootstrap samples would all be identical to the original sample (why?). .. rst-class:: to-build - Drawing with replacement allows each bootstrap observation to be drawn in an :math:`i.i.d.` fashion from the sample. .. rst-class:: to-build - So, the original sample plays the role of the population. Bootstrap Estimates ============================================================================== Let :math:`\theta` be a parameter of interest and let :math:`\hat{\theta}` denote an estimate of :math:`\theta` using a sample of data, :math:`\{y_i\}_{i=1}^n`. .. rst-class:: to-build - :math:`\hat{\theta}` might be calculated by maximum likelihood estimation. .. rst-class:: to-build - We could create :math:`B` new samples from :math:`\{y_i\}_{i=1}^n` by resampling with replacement. .. rst-class:: to-build - For each new sample :math:`j = 1, \ldots, B`, we could compute :math:`\hat{\theta}^*_j` in the exact way :math:`\hat{\theta}` was computed with :math:`\{y_i\}_{i=1}^n`. Bootstrap Estimates ============================================================================== - One way to estimate :math:`E[\hat{\theta}]` is by averaging the bootstrap estimates: .. rst-class:: to-build .. math:: E[\hat{\theta}] \approx \bar{\hat{\theta}}^* = \frac{1}{B} \sum_{j=1}^B \hat{\theta}^*_j. Estimating Bias ============================================================================== True bias for an estimator is defined as .. math:: \text{BIAS}(\hat{\theta}) = E[\hat{\theta}] - \theta. .. rst-class:: to-build - We can approximate the population average, :math:`E[\hat{\theta}]`, with a bootstrap average, :math:`\bar{\hat{\theta}}^*`: .. rst-class:: to-build .. math:: \text{BIAS}_{\text{boot}}(\hat{\theta}) = \bar{\hat{\theta}}^* - \hat{\theta}. .. rst-class:: to-build - We replaced the true population value, :math:`\theta`, with the sample value, :math:`\hat{\theta}`, since the sample substitutes for the population. Estimating Standard Error ============================================================================== The true standard deviation of :math:`\hat{\theta}` can be estimated with the bootstrap estimates: .. math:: s_{\text{boot}}(\hat{\theta}) = \sqrt{\frac{1}{B-1} \sum_{j=1}^B \left(\hat{\theta}^*_j - \bar{\hat{\theta}}^* \right)^2}. Example: Pareto Distribution ============================================================================== Suppose we have a sample of random variables drawn from a Pareto distribution: .. math:: Y_i \stackrel{i.i.d.}{\sim} \mathcal{P}(\alpha, \beta), \qquad i=1,\ldots, n. .. rst-class:: to-build - The density of each :math:`Y_i` is .. rst-class:: to-build .. math:: f(y|\alpha, \beta) = \frac{\beta \alpha^\beta}{y^{\beta+1}}. .. rst-class:: to-build - If :math:`Y_i \sim \mathcal{P}(\alpha, \beta)`, then :math:`\alpha > 0`, :math:`\beta > 0` and :math:`Y_i > \alpha`. .. rst-class:: to-build - :math:`\alpha` is a parameter dictating the minimum possible value of :math:`Y_i` and :math:`\beta` is a shape parameter. Example: Pareto Distribution ============================================================================== In the graph below, :math:`\beta = \kappa`. .. ifslides:: .. image:: /_static/Resample/paretoPDF.png :width: 5.5in :align: center .. ifnotslides:: .. image:: /_static/Resample/paretoPDF.png :width: 6in Example: Pareto Distribution ============================================================================== The joint density of :math:`{\bf Y} = (Y_1, \ldots, Y_n)'` is .. rst-class:: to-build .. math:: f_{{\bf Y}}({\bf y}|\alpha, \beta) = \prod_{i=1}^n f_{Y_i}(y_i|\alpha, \beta) \qquad \; \, .. rst-class:: to-build .. math:: = \prod_{i=1}^n \frac{\beta \alpha^\beta}{y_i^{\beta+1}} .. rst-class:: to-build .. math:: \quad \; \; = \frac{\beta^n \alpha^{n\beta}}{\prod_{i=1}^n y_i^{\beta+1}}. Example: Pareto Distribution ============================================================================== Assuming :math:`\alpha` is known, the log likelihood of :math:`\beta` is .. rst-class:: to-build .. math:: \ell (\beta|\alpha, {\bf y}) = \log(f_{{\bf Y}}({\bf y}|\alpha, \beta)) \qquad \qquad \qquad \qquad \qquad \qquad .. rst-class:: to-build .. math:: \quad \, = n \log(\beta) + n\beta \log(\alpha) - (1+\beta) \sum_{i=1}^n \log(y_i). Example: Pareto Distribution ============================================================================== The MLE, :math:`\hat{\beta}` is the value such that .. rst-class:: to-build .. math:: \frac{\partial \ell}{\partial \beta}\bigg|_{\beta = \hat{\beta}} = \frac{n}{\hat{\beta}} + n \log(\alpha) - \sum_{i=1}^n \log(y_i) = 0. .. rst-class:: to-build .. math:: \Rightarrow \hat{\beta} = \frac{n}{\sum_{i=1}^n \log(y_i) - n\log(\alpha)}. Example: Pareto Distribution ============================================================================== The second derivative of the log likelihood is .. math:: \frac{\partial^2 \ell}{\partial \beta^2} & = -\frac{n}{\beta^2}. .. rst-class:: to-build - The observed Fisher information is .. rst-class:: to-build .. math:: \tilde{\mathcal{I}}(\hat{\beta}) & = -\frac{\partial^2 \ell}{\partial \beta^2} \bigg|_{\beta = \hat{\beta}} = \frac{n}{\hat{\beta}^2}. .. rst-class:: to-build - The asymptotic standard error of :math:`\hat{\beta}` is .. rst-class:: to-build .. math:: Std(\hat{\beta}) & \approx \sqrt{\tilde{\mathcal{I}}(\hat{\beta})^{-1}} = \frac{\hat{\beta}}{\sqrt{n}}. Example: Pareto Distribution ============================================================================== Given a sample of :math:`n` observations from a Pareto distribution: .. rst-class:: to-build - We can compute the MLE, :math:`\hat{\beta}`. .. rst-class:: to-build - We can compute the asymptotic standard error :math:`\hat{\beta}/\sqrt{n}`. Example: Pareto Distribution ============================================================================== We can generate :math:`B` new samples by resampling. .. rst-class:: to-build - For each new sample, we can compute :math:`\hat{\beta}_j`, :math:`j=1,\ldots,B`. .. rst-class:: to-build - We can compute the standard deviation of :math:`\{\hat{\beta}_j\}_{j=1}^B` and compare to the asymptotic standard error. .. rst-class:: to-build - The bootstrap standard error will be a better estimate of variation than the asymptotic standard error when :math:`n` is small. Bootstrap Confidence Intervals ============================================================================== Given a set of bootstrap estimates, :math:`\{\hat{\theta}^*_j\}_{j=1}^B`, we can form a :math:`1-\alpha` confidence interval with the normal approximation .. rst-class:: to-build .. math:: \left(\hat{\theta} - s_{\text{boot}}(\hat{\theta}) \, z_{\alpha/2}, \,\, \hat{\theta} + s_{\text{boot}}(\hat{\theta}) \, z_{\alpha/2}\right) .. rst-class:: to-build where :math:`z_{\alpha/2}` is the :math:`\alpha` -upper quantile of the standard normal distribution. .. rst-class:: to-build - Note that the interval is centered around :math:`\hat{\theta}` rather than :math:`\theta`. .. rst-class:: to-build - In this case :math:`\hat{\theta}` is substituted for :math:`\theta`, just as the data sample is substituted for the true population. Bootstrap Confidence Intervals ============================================================================== Alternatively, we could compute the :math:`\alpha` and :math:`1-\alpha` empirical quantiles of the bootstrap estimates, :math:`\{\hat{\theta}^*_j\}_{j=1}^B`: :math:`q_{\alpha/2}` and :math:`q_{(1-\alpha)/2}`. .. rst-class:: to-build - The resulting :math:`1-\alpha` confidence interval is .. rst-class:: to-build .. math:: \left(q_{\alpha/2}, q_{(1-\alpha)/2}\right).