Probability

Random Variables

Suppose \(X\) is a random variable which can take values \(x \in \mathcal{X}\).

  • \(X\) is a discrete r.v. if \(\mathcal{X}\) is countable.
    • \(p(x)\) is the probability of a value \(x\) and is called the probability mass function.
  • \(X\) is a continuous r.v. if \(\mathcal{X}\) is uncountable.
    • \(f(x)\) is called the probability density function and can be thought of as the probability of a value \(x\).

Probability Mass Function

For a discrete random variable the probability mass function (PMF) is

\[p(a) = P(X = a),\]

where \(a \in \mathbb{R}\).

Probability Density Function

If \(B = (a,b)\)

\[P(X \in B) = P(a \leq X \leq b) = \int_a^b f(x) dx.\]

Strictly speaking

\[P(X = a) = \int_a^a f(x) dx = 0,\]

but we may (intuitively) think of \(f(a) = P(X=a)\).

Properties of Distributions

For discrete random variables

  • \(p(x) \geq 0\), \(\forall x \in \mathcal{X}\).
  • \(\sum_{x\in \mathcal{X}} p(x) = 1\).

For continuous random variables

  • \(f(x) \geq 0\), \(\forall x \in \mathcal{X}\).
  • \(\int_{x\in \mathcal{X}} f(x)dx = 1\).

Cumulative Distribution Function

For discrete random variables the cumulative distribution function (CDF) is

  • \(F(a) = P(X \leq a) = \sum_{x \leq a} p(x).\)

For continuous random variables the CDF is

  • \(F(a) = P(X \leq a) = \int_{-\infty}^a f(x) dx.\)

Expected Value

For a discrete r.v. \(X\), the expected value is

\[E[X] = \sum_{x\in \mathcal{X}} x p(x).\]

For a continuous r.v. \(X\), the expected value is

\[E[X] = \int_{x\in \mathcal{X}} x \, f(x) dx.\]

Expected Value

If \(Y = g(X)\), then

  • For discrete r.v. \(X\)
\[E[Y] = E[g(X)] = \sum_{x \in \mathcal{X}} g(x)p(x).\]
  • For continuous r.v. \(X\)
\[E[Y] = E[g(X)] = \int_{x \in \mathcal{X}} g(x)f(x)dx.\]

Properties of Expectation

For random variables \(X\) and \(Y\) and constants \(a,b \in \mathbb{R}\), the expected value has the following properties (for both discrete and continuous r.v.’s):

  • \(E[aX + b] = aE[X] + b.\)
  • \(E[X + Y] = E[X] + E[Y].\)

Realizations of \(X\), denoted by \(x\), may be larger or smaller than \(E[X]\).

  • If you observed many realizations of \(X\), \(E[X]\) is roughly an average of the values you would observe.

Properties of Expectation - Proof

\[\begin{split}E[aX + b] & = \int_{-\infty}^{\infty} (ax+b)f(x) dx \\ & = \int_{-\infty}^{\infty} a x f(x) dx + \int_{-\infty}^{\infty} b f(x) dx \\ & = a \int_{-\infty}^{\infty} x f(x) dx + b \int_{-\infty}^{\infty} f(x) dx \\ & = a\,E[X] + b.\end{split}\]

Variance

Generally speaking, variance is defined as

\[Var(X) = E\left[(X - E[X])^2\right].\]

If \(X\) is discrete:

\[Var(X) = \sum_{x\in \mathcal{X}} (x - E[X])^2 p(x).\]

If \(X\) is continuous:

\[\begin{split}Var(X) & = \int_{x\in \mathcal{X}} (x - E[X])^2 f(x) dx\end{split}\]

Variance

Using the properties of expectations, we can show \(Var(X) = E[X^2] - E[X]^2\):

\[\begin{split}Var(X) & = E\left[(X - E[X])^2\right] \\ & = E\left[X^2 - 2XE[X] + E[X]^2\right] \\ & = E[X^2] - 2E[X]E[X] + E[X]^2 \\ & = E[X^2] - E[X]^2.\end{split}\]

Standard Deviation

The standard deviation is simply

\[Std(X) = \sqrt{Var(X)}.\]
  • \(Std(X)\) is in the same units as \(X\).
  • \(Var(X)\) is in units squared.

Covariance

For two random variables \(X\) and \(Y\), the covariance is generally defined as

\[Cov(X,Y) = E\left[(X-E[X])(Y-E[Y])\right]\]

Note that \(Cov(X,X) = Var(X)\).

Covariance

Using the properties of expectations, we can show

\[Cov(X,Y) = E[XY] - E[X]E[Y].\]

This can be proven in the exact way that we proved

\[Var(X) = E[X^2] - E[X]^2.\]

In fact, note that

\[\begin{split}Cov(X,X) & = E[XY] - E[X]E[Y] \\ & = E[X^2] - E[X]^2 = Var(X).\end{split}\]

Properties of Variance

Given random variables \(X\) and \(Y\) and constants \(a,b \in \mathbb{R}\),

\[Var(aX + b) = a^2Var(X).\]
\[\begin{split}Var(aX+bY) & = a^2Var(X) + b^2Var(Y) \\ & \hspace{3in} + 2abCov(X,Y).\end{split}\]

The latter property can be generalized to

\[\begin{split}Var\left(\sum_{i=1}^n a_i X_i \right) & = \sum_{i=1}^n a_i^2Var(X_i) \\ & \hspace{1in} + 2 \sum_{i=1}^{n-1} \sum_{j=i+1}^n a_i a_j Cov(X_i, X_j).\end{split}\]

Properties of Variance - Proof

\[\begin{split}Var&(aX+bY) = E\left[(aX+bY)^2\right] - E\left[aX+bY\right]^2 \\ & = E[a^2X^2 + b^2Y^2 + 2abXY] - \left(aE[X]+bE[Y]\right)^2 \\ & = a^2 E[X^2] + b^2 E[Y^2] + 2abE[XY] \\ & \hspace{1in} - a^2E[X]^2 - b^2E[Y]^2 -2abE[X]E[Y] \\ & = a^2 \left(E[X^2] - E[X]^2\right) + b^2 \left(E[Y^2] - E[Y]^2\right) \\ & \hspace{1.5in} + 2ab \left(E[XY] - E[X]E[Y]\right) \\ & = a^2Var(X) + b^2Var(Y) + 2abCov(X,Y).\end{split}\]

Properties of Covariance

Given random variables \(W\), \(X\), \(Y\) and \(Z\) and constants \(a,b \in \mathbb{R}\),

\[Cov(X,a) = 0.\]
\[Cov(aX,bY) = abCov(X,Y).\]
\[\begin{split}Cov(W+X,Y+Z) & = Cov(W,Y) + Cov(W,Z) \\ & \hspace{1.3in}+ Cov(X,Y) + Cov(X,Z).\end{split}\]

The latter two can be generalized to

\[\begin{split}Cov\left(\sum_{i=1}^n a_i X_i, \sum_{j=1}^m b_j Y_j\right) & = \sum_{i=1}^n \sum_{j=1}^m a_i b_j Cov(X_i, Y_j).\end{split}\]

Correlation

Correlation is defined as

\[Corr(X,Y) = \frac{Cov(X,Y)}{Std(X) Std(Y)}.\]
  • It is fairly easy to show that \(-1 \leq Corr(X,Y) \leq 1\).
  • The properties of correlations of sums of random variables follow from those of covariance and standard deviations above.

Normal Distribution

The normal distribution is often used to approximate the probability distribution of returns.

  • It is a continuous distribution.
  • It is symmetric.
  • It is fully characterized by \(\mu\) (mean) and \(\sigma\) (standard deviation) – i.e. if you only tell me \(\mu\) and \(\sigma\), I can draw every point in the distribution.

Normal Density

If \(X\) is normally distributed with mean \(\mu\) and standard deviation \(\sigma\), we write

\[X \sim \mathcal{N}(\mu, \sigma).\]

The probability density function is

\[f(x) = \frac{1}{\sqrt{2\pi} \sigma} \exp\left\{\frac{1}{2\sigma^2}(x - \mu)^2\right\}.\]

Normal Distribution

From Wikipedia:

_images/Normal_Distribution_PDF.png

Standard Normal Distribution

Suppose \(X \sim \mathcal{N}(\mu, \sigma)\).

Then

\[Z = \frac{X - \mu}{\sigma}\]

is a standard normal random variable: \(Z \sim \mathcal{N}(0,1)\).

  • That is, \(Z\) has zero mean and unit standard deviation.

We can reverse the process by defining

\[X = \mu + \sigma Z.\]

Standard Normal Distribution - Proof

\[\begin{split}E[Z] & = E\left[\frac{X - \mu}{\sigma}\right] \\ & = \frac{1}{\sigma} E[X - \mu] \\ & = \frac{1}{\sigma} (E[X] - \mu) \\ & = \frac{1}{\sigma} (\mu - \mu) \\ & = 0.\end{split}\]

Standard Normal Distribution - Proof

\[\begin{split}Var(Z) & = Var\left(\frac{X - \mu}{\sigma}\right) \\ & = Var\left(\frac{X}{\sigma} - \frac{\mu}{\sigma}\right) \\ & = \frac{1}{\sigma^2} Var(X) \\ & = \frac{\sigma^2}{\sigma^2} \\ & = 1.\end{split}\]

Sum of Normals

Suppose \(X_i \sim \mathcal{N}(\mu_i, \sigma_i)\) for \(i = 1,\ldots,n\).

Then if we denote \(W = \sum_{i=1}^n X_i\)

\[W \sim \mathcal{N}\left(\sum_{i=1}^n \mu_i, \sqrt{\sum_{i=1}^n \sigma_i^2 + 2\sum_{i=1}^j \sum_{j=1}^n Cov(X_i, X_j)}\right).\]

How does this simplify if \(Cov(X_i, X_j) = 0\) for \(i \neq j\)?

Sample Mean

Suppose we don’t know the true probabilities of a distribution, but would like to estimate the mean.

  • Given a sample of observations, \(\{x_i\}_{i=1}^n\), of random variable \(X\), we can estimate the mean by
\[\hat{\mu} = \frac{1}{n} \sum_{i=1}^n x_i.\]
  • This is just a simple arithmetic average, or a probability weighted average with equal probabilities: \(\frac{1}{n}\).
  • But the true mean is a weighted average using actual (most likely, unequal) probabilities. How do we reconcile this?

Sample Mean (Cont.)

Given that the sample \(\{x_i\}_{i=1}^n\) was drawn from the distribution of \(X\), the observed values are inherently weighted by the true probabilities (for large samples).

  • More values in the sample will be drawn from the higher probability regions of the distribution.
  • So weighting all of the values equally will naturally give more weight to the higher probability outcomes.

Sample Variance

Similarly, the sample variance can be defined as

\[\begin{split}\hat{\sigma}^2 & = \frac{1}{n-1} \sum_{i=1}^n (x_i - \hat{\mu})^2.\end{split}\]

Notice that we use \(\frac{1}{n-1}\) instead of \(\frac{1}{n}\) for the sample average.

  • This is because a simple average using \(\frac{1}{n}\) underestimates the variability of the data because it doesn’t account for extra error involved in estimating \(\hat{\mu}\).

Other Sample Moments

Sample standard deviations, covariances and correlations are computed in a similar fashion.

  • Use the definitions above, replacing expectations with simple averages.