.. slideconf::
   :slide_classes: appear

==============================================================================
Prior and Posterior Distributions
==============================================================================


Prior Distribution
==============================================================================
Bayes' Theorem can be used to model the distribution of parameters.  

.. rst-class:: to-build

- Recall that the likelihood of data :math:`{\bf y}` can be
  expressed as :math:`f({\bf y}|{\bf \theta})`.

.. rst-class:: to-build

- :math:`{\bf \theta}` is a vector of parameters. 

.. rst-class:: to-build
  
- In reality, we think of :math:`{\bf \theta}` as a set of unknown
  values that are *not random*.

.. rst-class:: to-build

- However, we treat :math:`{\bf \theta}` as random because of our
  lack of knowledge.

.. rst-class:: to-build

- That is, our lack of knowledge induces a distribution over
  :math:`{\bf \theta}`.


Prior Distribution and Likelihood
==============================================================================
The *prior distribution* :math:`\pi({\bf \theta})` expresses our
beliefs about :math:`{\bf \theta}` prior to observing data
:math:`{\bf y}`.

.. rst-class:: to-build

- :math:`\pi({\bf \theta})` is different from the likelihood:
  :math:`f({\bf y}|{\bf \theta})`.

.. rst-class:: to-build

- :math:`\pi({\bf \theta})` is loosely interpreted as the
  probability of :math:`{\bf \theta}` occurring before we observe
  data.

.. rst-class:: to-build

- :math:`f({\bf y}|{\bf \theta})` is loosely interpreted as the
  probability of the data occurring, given a specific value of the
  parameter vector :math:`{\bf \theta}`.


Joint Density
==============================================================================
The joint density of :math:`{\bf y}` and :math:`{\bf \theta}` is

.. rst-class:: to-build

.. math::

   f({\bf y},{\bf \theta}) & = f({\bf y}|{\bf \theta})
   \pi({\bf \theta}).

.. rst-class:: to-build

- This is analogous to the relationship we previously derived:
  
.. rst-class:: to-build

.. math::

    P(A \cap B) & = P(A|B)P(B).
  

Marginal Density
==============================================================================
The marginal density of :math:`{\bf y}` is

.. rst-class:: to-build

.. math::

   f({\bf y}) & = \int f({\bf y},{\bf \theta}) d{\bf \theta}
   = \int f({\bf y}|{\bf \theta}) \pi({\bf \theta})
   d{\bf \theta}.


Marginal Density
==============================================================================
- This is analogous to the relationship we previously derived:
  
.. rst-class:: to-build

.. math::

    P(A) = P\left((A \cap B_1) \cup \cdots \cup (A \cap
    B_K)\right) \hspace{0.38in}
  
.. rst-class:: to-build

.. math::

    = P(A \cap B_1) + \cdots + P(A \cap B_K)
  
.. rst-class:: to-build

.. math::

    \hspace{0.53in} = P(A|B_1) P(B_1) + \cdots + P(A|B_K) P(B_K)
  
.. rst-class:: to-build

.. math::

    = \sum_{i=1}^K P(A|B_i) P(B_i), \hspace{0.8in}
  
.. rst-class:: to-build 

for a partition :math:`\{B_i\}_{i=1}^K`.


Posterior Distribution
==============================================================================
According to Bayes' Theorem,

.. rst-class:: to-build

.. math::

   \pi({\bf \theta}|{\bf y}) & =
   \frac{f({\bf y}|{\bf \theta})
   \pi({\bf \theta})}{f({\bf y})} \hspace{0.92in}

.. rst-class:: to-build

.. math::

   = \frac{f({\bf y}|{\bf \theta})
   \pi({\bf \theta})}{\int f({\bf y}|{\bf \theta})
   \pi({\bf \theta}) d{\bf \theta}}.

.. rst-class:: to-build

- :math:`\pi({\bf \theta}|{\bf y})` is referred to as the
  *posterior distribution* of :math:`{\bf \theta}`.

.. rst-class:: to-build

- :math:`\pi({\bf \theta}|{\bf y})` is loosely interpreted as the
  probability of :math:`{\bf \theta}` after observing
  :math:`{\bf y}`.


Bayesian Updating
==============================================================================
Bayesian analysis is a method to use data to update our beliefs about
:math:`{\bf \theta}`.

.. rst-class:: to-build

- We begin with a prior distribution :math:`\pi({\bf \theta})` which
  captures our views about the likelihood of :math:`{\bf \theta}`
  taking particular values.

.. rst-class:: to-build

- We specify a model for the probability density of the data, given
  :math:`{\bf \theta}`: :math:`f({\bf y}|{\bf \theta})`.

.. rst-class:: to-build

- We use the likelihood to update our beliefs about
  :math:`{\bf \theta}`:
  
.. rst-class:: to-build

.. math::

   \pi({\bf \theta}|{\bf y}) & =
   \frac{f({\bf y}|{\bf \theta}) \pi({\bf \theta})}{\int
   f({\bf y}|{\bf \theta}) \pi({\bf \theta})
   d{\bf \theta}}.

.. rst-class:: to-build
   
- If the data are very informative,
  :math:`\pi({\bf \theta}|{\bf y})` can be quite different from
  :math:`\pi({\bf \theta})`. 


A Note on Proportionality
==============================================================================
Suppose

.. rst-class:: to-build

.. math::

   w = ax

.. rst-class:: to-build

.. math::

   y = bx

.. rst-class:: to-build

.. math::

   z = wy

.. rst-class:: to-build

then

.. rst-class:: to-build

.. math::

   w \propto x

.. rst-class:: to-build

.. math::

   y \propto x

.. rst-class:: to-build

.. math::

   z \propto x^2.


A Note on Proportionality
==============================================================================
More generally, if

.. rst-class:: to-build

.. math::

   w = g_w(x) h_w(u)

.. rst-class:: to-build

.. math::

   y = g_y(x) h_y(u)

.. rst-class:: to-build

.. math::

   z = wy

.. rst-class:: to-build

then

.. rst-class:: to-build

.. math::

   w \propto g_w(x)

.. rst-class:: to-build

.. math::

   y \propto g_y(x)

.. rst-class:: to-build

.. math::

   z \propto g_w(x) g_y(x).


A Note on Proportionality
==============================================================================
Since :math:`f({\bf y})` is not a function of :math:`{\bf \theta}`, 

.. rst-class:: to-build

.. math::

   \pi({\bf \theta}|{\bf y}) & =
   \frac{f({\bf y}|{\bf \theta})\pi({\bf \theta})}{f({\bf y})}
   \propto f({\bf y}|{\bf \theta})\pi({\bf \theta}).
 
.. rst-class:: to-build

- It is often easier to work with only
  :math:`f({\bf y}|{\bf \theta})\pi({\bf \theta})`.


Conjugate Priors
==============================================================================
Our choice of :math:`\pi({\bf \theta})` and
:math:`f({\bf y}|{\bf \theta})` may not yield an analytic solution
for :math:`\pi({\bf \theta}|{\bf y})`.  

.. rst-class:: to-build

- :math:`\pi({\bf \theta}|{\bf y})` still exists, but it must be
  computed numerically.

.. rst-class:: to-build

- However, when the likelihood and prior have similar forms, they
  result in tractable posteriors.

.. rst-class:: to-build

- A *conjugate* prior is a distribution that results in a posterior of
  the same family when coupled with a particular likelihood.


Conjugate Priors
==============================================================================
- For example, if :math:`f({\bf y}|{\bf \theta})` is a binomial
  distribution and :math:`\pi({\bf \theta})` is a beta distribution,
  :math:`\pi({\bf \theta}|{\bf y})` will also be a beta
  distribution.

.. rst-class:: to-build

- Alternatively, if :math:`f({\bf y}|{\bf \theta})` is a
  normal distribution and :math:`\pi({\bf \theta})` is a normal
  distribution, :math:`\pi({\bf \theta}|{\bf y})` will also be a
  normal distribution.
 

Normal Example
==============================================================================
Suppose :math:`Y_1, \ldots, Y_n \stackrel{i.i.d.}{\sim}
\mathcal{N}(\mu,\sigma^2)`, where :math:`\sigma^2` is *known* and
:math:`\mu` is *unknown*.

.. rst-class:: to-build

- Assume :math:`\pi(\mu)` is :math:`\mathcal{N}(\mu_0, \sigma^2_0)`, where
  :math:`\mu_0` and :math:`\sigma^2_0` are known parameters. 

.. rst-class:: to-build
  
- We will see below that :math:`\sigma^2_0` provides a measure of how
  strong our beliefs are that :math:`\mu = \mu_0` prior to observing
  data.


Normal Example
==============================================================================
The prior is

.. rst-class:: to-build

.. math::

  \pi(\mu) & = \frac{1}{\sqrt{2\pi} \sigma_0} \exp
  \left\{-\frac{1}{2\sigma^2_0} (\mu - \mu_0)^2 \right\} \hspace{0.5in}

.. rst-class:: to-build

.. math::

  \hspace{0.4in} & = \frac{1}{\sqrt{2\pi} \sigma_0} \exp
  \left\{-\frac{1}{2\sigma^2_0} (\mu^2 - 2\mu\mu_0 + \mu^2_0)
  \right\}

.. rst-class:: to-build

.. math::

  & \propto \exp \left\{\frac{\mu \mu_0}{\sigma^2_0} -
    \frac{\mu^2}{2\sigma^2_0} \right\}. \hspace{1in}


Normal Example
==============================================================================
The likelihood is

.. rst-class:: to-build

.. math::

   f(Y_1,\ldots, Y_n|\mu) & = \prod_{i=1}^n \left[
   \frac{1}{\sqrt{2\pi} \sigma} \exp\left\{-\frac{1}{2\sigma^2}
   (Y_i - \mu)^2\right\}\right] \hspace{0.25in}

.. rst-class:: to-build

.. math::

   & = \left(\frac{1}{2\pi \sigma^2}\right)^{n/2}
   \exp \left\{-\frac{1}{2\sigma^2} \sum_{i=1}^n (Y_i -
   \mu)^2\right\} \hspace{0.13in}

.. rst-class:: to-build

.. math::

   \hspace{0.5in} & = \left(\frac{1}{2\pi \sigma^2}\right)^{n/2}
   \exp\left\{-\frac{1}{2\sigma^2} \left(-2n\bar{Y}\mu + n\mu^2 -
   \sum_{i=1}^n Y_i^2\right)\right\}

.. rst-class:: to-build

.. math::

   & \propto \exp\left\{\frac{n\bar{Y}\mu}{\sigma^2} -
   \frac{n\mu^2}{2\sigma^2} \right\} \hspace{1.5in}


Normal Example
==============================================================================
The posterior is

.. rst-class:: to-build

.. math::

   \pi(\mu|Y_1,\ldots,Y_n) & \propto f(Y_1,\ldots,
   Y_n|\mu) \pi(\mu) \hspace{2in}

.. rst-class:: to-build

.. math::

   \hspace{1.07in} & \propto \exp\left\{\frac{n\bar{Y}\mu}{\sigma^2} -
   \frac{n\mu^2}{2\sigma^2} \right\} \exp \left\{\frac{\mu
   \mu_0}{\sigma^2_0} - \frac{\mu^2}{2\sigma^2_0} \right\}

.. rst-class:: to-build

.. math::

   \hspace{1.07in} & = \exp \left\{\left(\frac{n\bar{Y}}{\sigma^2} +
   \frac{\mu_o}{\sigma^2_0}\right) \mu - \left(\frac{n}{2\sigma^2} +
   \frac{1}{2\sigma^2_0} \right) \mu^2\right\}

.. rst-class:: to-build

.. math::

   & = \exp\left\{A \mu - \frac{B}{2} \mu^2\right\} \hspace{0.63in}

.. rst-class:: to-build

.. math::

   & = \exp\left\{-\frac{B}{2} \left(\mu^2 -
   \frac{2A}{B} \mu \right)\right\}


Normal Example
==============================================================================

.. rst-class:: to-build

.. math::

   \hspace{0.65in} & \propto \exp\left\{-\frac{B}{2}
   \left(\mu^2 - \frac{2A}{B} \mu \right)\right\}
   \exp\left\{-\frac{B}{2} \left(\frac{A}{B}\right)^2\right\}

.. rst-class:: to-build

.. math::

   & = \exp\left\{-\frac{B}{2} \left(\mu^2 -
   \frac{2A}{B} \mu + \left(\frac{A}{B}\right)^2 \right) \right\}

.. rst-class:: to-build

.. math::

   & = \exp\left\{-\frac{B}{2} \left(\mu -
   \frac{A}{B}\right)^2\right\}. \hspace{0.9in}


Normal Example
==============================================================================
We see that :math:`\pi(\mu|Y_1,\ldots,Y_n)` is
:math:`\mathcal{N}\left(\frac{A}{B}, \frac{1}{B}\right)` where

.. rst-class:: to-build

.. math::

   E[\mu|Y_1,\ldots,Y_n] & = \frac{A}{B} =
   \frac{\frac{n\bar{Y}}{\sigma^2} +
   \frac{\mu_0}{\sigma^2_0}}{\frac{n}{\sigma^2} +
   \frac{1}{\sigma^2_0}}

.. rst-class:: to-build

.. math::

   Var(\mu|Y_1,\ldots,Y_n) & = \frac{1}{B} =
   \frac{1}{\frac{n}{\sigma^2} + \frac{1}{\sigma^2_0}}.


Normal Example
==============================================================================
- If :math:`\sigma^2_0` is very small relative to
  :math:`\sigma^2/n`, :math:`E[\mu|Y_1,\ldots,Y_n] \approx \mu_0` and
  :math:`Var(\mu|Y_1,\ldots,Y_n) \approx \sigma^2_0`.

.. rst-class:: to-build

- In this case, the prior is very precise and contains a lot of
  information - the data doesn't add much to prior knowledge.

.. rst-class:: to-build

- If :math:`\sigma^2/n` is very small relative to
  :math:`\sigma^2_0`, :math:`E[\mu|Y_1,\ldots,Y_n] \approx \bar{Y}` and
  :math:`Var(\mu|Y_1,\ldots,Y_n) \approx \frac{\sigma^2}{n}`.

.. rst-class:: to-build

- In this case, the prior is very imprecise and contains very little
  information - the data is very informative and adds a lot to prior
  knowledge.


Moderate Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/midPrior1.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/midPrior1.png
     :width: 6in


Moderate Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/midPrior2.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/midPrior2.png
     :width: 6in


Moderate Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/midPrior3.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/midPrior3.png
     :width: 6in


Uninformative Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/badPrior1.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/badPrior1.png
     :width: 6in


Uninformative Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/badPrior2.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/badPrior2.png
     :width: 6in


Uninformative Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/badPrior3.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/badPrior3.png
     :width: 6in


Informative Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/goodPrior1.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/goodPrior1.png
     :width: 6in


Informative Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/goodPrior2.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/goodPrior2.png
     :width: 6in


Informative Prior
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/goodPrior3.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/goodPrior3.png
     :width: 6in


Bayesian Parameter Estimates
==============================================================================
The most common Bayesian parameter estimates are 

.. rst-class:: to-build

- The mean of the posterior distribution.  

.. rst-class:: to-build

- The mode of the posterior distribution.  

.. rst-class:: to-build

- The median of the posterior distribution. 

.. rst-class:: to-build
  
- For large :math:`n`, the the mode is approximately equal to the
  MLE.


Frequentist Confidence Intervals
==============================================================================
When constructing typical confidence intervals: 

.. rst-class:: to-build

- Parameters are viewed as fixed and data as random.

.. rst-class:: to-build

- The interval is random because the data is random.

.. rst-class:: to-build

- We interpret the interval as containing the true parameter with some
  probability *before the data are observed*.

.. rst-class:: to-build

- Once the data are observed, the computed interval either contains or
  does not contain the true parameter.

.. rst-class:: to-build
  
- We interpret a 95\% confidence interval in the following way:
  if we could draw 100 samples similar to the one we have, roughly
  95 of the associated confidence intervals should contain the true
  parameter.


Bayesian Credible Intervals
==============================================================================
Bayesian credible intervals are the Bayesian equivalent to frequentist
confidence intervals.

.. rst-class:: to-build

- In the Bayesian paradigm, the parameters are viewed as random while
  the data are fixed.

.. rst-class:: to-build

- An interval based on the posterior distribution has a natural
  interpration as a probability of containing the true parameter,
  *even after the data have been observed*.


Equal-tails Credible Interval
==============================================================================
The most basic :math:`1-\alpha` credible interval is formed by computing
the :math:`\alpha/2` and :math:`1-\alpha/2` quantiles of the posterior
distribution.  

.. rst-class:: to-build

- For example, suppose :math:`\alpha = 0.05`: you want to compute a
  95\% credible interval.

.. rst-class:: to-build

- Determine the 0.025 and 0.975 quantiles.  

.. rst-class:: to-build

- These are the values corresponding to 2.5\% of the distribution in
  the lower tail and 2.5\% of the distribution in the upper tail.


Equal-tails Credible Interval
==============================================================================
:math:`\qquad`

  .. ifslides::

  .. image:: /_static/PriorPost/credInt.png
     :width: 7.5in
     :align: center

.. ifnotslides::

  .. image:: /_static/PriorPost/credInt.png
     :width: 6in