Rigorous Probability (2)

Rigorous Probability (2)

Conditional Expectations

Elementary Conditional Probabilities

Definition 8.2 Conditional Probability

Let \((\Omega, \mathbb{A}, P)\) be a probability space adn \(B \in \mathbb{A}\). We define the conditional probability given \(B\) for any \(A \in \mathbb{A}\) by:

\[P(A | B) = \frac{P(A \cap B)}{P(B)}\]

with \(P(B) > 0\).


Theorem 8.4: \(P(\cdot | B)\) is a Probability Measure

If \(P(B) > 0\), then \(P(\cdot | B)\) is a probability measure on \((\Omega, \mathbb{A})\).


Theorem 8.5: Conditional Probability of Independent Events

Let \(A, B \in \mathbb{A}\) with \(P(A), P(B) > 0\). Then, belows are equivalent:

  1. \(A, B\) are independent.
  2. \(P(A | B) = P(A)\)
  3. \(P(B | A) = P(B)\)


Theorem 8.6: Summation Formula (Law of Total Probability)

Let \(I\) be a countable set and let \((B_i)_{i \in I} \in \mathbf{A}\) be disjoint sets with \(P[\bigcup_{i \in I} B_i] = 1\). Then, for any \(A \in \mathbb{A}\):

\[P(A) = \sum_{i \in I} P(A | B_i) P(B_i)\]

If \(X\) is a continuous random variable, then:

\[P(A) = \int P(A | X=x) f(x) dx\]


Theorem 8.7: Bayes's Formula

Let \(I\) be a countable set and let \((B_i)_{i \in I} \in \mathbf{A}\) be pairwise disjoint sets with \(P(B_i) = 1\). Then, for any \(A \in \mathbb{A}\) with \(P(A) > 0\) and any \(k \in I\),

\[P(B_k | A) = \frac{P(A | B_k) P(B_k)}{\sum_{i \in I} P(A | B_i) P(B_i)}\]


Definition 8.9: Condition on Event

Let \(X \in L^1(P)\) and \(A \in \mathbf{A}\). Then we define:

\[E[X | A] := \int X(\omega) P(d\omega | A) = \frac{E[\mathbb{I}_A X]}{P(A)}\]

given \(P(A) > 0\). Clearly \(P(B | A) = \int \mathbb{I}_B dP(\cdot | A) = E[\mathbb{I}_B | A]\)


Definition 8.10: Condition on Sigma Algebra of Disjoint Events

Let \(I\) be a countable set and let \((B_i)_{i \in I}\) be pairwise disjoint events with \(\bigcup_{i \in I} B_i = \Omega\). We define \(F := \sigma(B_i, i \in I)\). For \(X \in L^1(P)\), we define a map \(E[X | F]: \Omega \rightarrow \mathbb{R}\) by:

\[E[X | F](\omega) = E[X | B_i] \quad \text{ IFF } \quad \omega \in B_i\]


Theorem 8.10: Properties of \(E[X | F]\)

The map \(E[X | F]\) has the following properties:

  • \(E[X | F]\) is \(F\)-measurable.
  • \(E[X | F] \in L^1(P)\), and for any \(A \in F\), we have: \[\int_A E[X | F] dP = \int_A X dP\]


Conditional Expectation

All the equalities involving conditional expectations are only up to equality a.s. And we will write it explicitly.

Definition 8.11: Conditional Expectation

Let \(F \subseteq \mathbf{A}\) be a sub-\(\sigma\)-algebra (A \(\sigma\)-algebra contained in \(\mathbf{A}\)). Any random variable \(Y\) is called a conditional expectation of \(X\) given \(F\), symbolically \(E[X | F] := Y\), if:

  1. \(Y\) is \(F\)-measurable.
  2. For any \(A \in F\), we have \(E[\mathbb{I}_A X] = E[\mathbb{I}_A Y]\).

For \(B \in \mathbf{A}\), \(P(B | F) := E[\mathbb{I}_B | F]\) is called a conditional probability of \(B\) given the \(\sigma\)-algebra \(F\).


Theorem 8.12: Existence and Uniqueness of Conditional Expectation

\(E[X | F]\) exists and is unique (up to equality almost surely).


Definition 8.13: Conditional Expectation with R.V

If \(Y\) is a random variable and \(X \in L^1(P)\), then we define \(E[X | Y] := E[X | \sigma(Y)]\).


Theorem 8.14: Properties of the Conditional Expectation

Let \((\Omega, \mathbf{A}, P)\) and let \(X\) be as above. Let \(G \subseteq F \subseteq \mathbf{A}\) be \(\sigma\)-algebras and let \(Y \in L^1(\Omega, \mathbf{A}, P)\). Then:

  1. Linearity: \[E[\lambda E + Y | F] = \lambda E[X | F] + E[Y | F]\]
  2. Monotonicity: If \(X \geq Y\) a.s, then \(E[X | F] \geq E[Y | F]\).
  3. If \(E[|XY|] < \infty\), and \(Y\) is measurable w.r.t \(F\), then: \[E[XY | F] = YE[X | F], \quad E[Y | F] = E[Y | Y] = Y\]
  4. Tower Property: \[E[E[X | F] | G] = E[E[X | G] | F] = E[X | G]\]
  5. Triangle inequality: \[E[|X| | F] \geq |E[X | F]|\]
  6. Independence: if \(\sigma(X)\) and \(F\) are independent, then \(E[X | F] = E[X]\).
  7. If \(P(A) \in \{0, 1\}\) for any \(A \in F\), then \(E[X | F] = E[X]\).
  8. Dominated Convergence: Assume \(Y \in L^1(P)\), \(Y \geq 0\) and \((X_n)_{n \in \mathbb{N}}\) is a sequence of random variables with \(|X_n| \leq Y\) for \(n \in \mathbb{N}\) and s.t \(P(\lim_{n \rightarrow \infty} X_n = X) = 1\). Then: \[\lim_{n \rightarrow \infty} E[X_n | F] = E[X | F] , \;\text{ a. s and in $L^1(P)$}\]

Intuitively, \(E[X | F]\) is the best prediction we can make for the value of \(X\) if we only have the information of the \(\sigma-\)algebra \(F\). If \(\sigma(X) \subseteq F\), then \(E[X | F] = X\), that is, the knowledge of \(F\) give all information on \(X\).


Theorem 8.17: Conditional Expectation as Orthogonal Projection

Let \(F \subseteq \mathbf{A}\) be a \(\sigma\)-algebra and let \(X\) be a random variable with \(E[X^2] < \infty\). Then \(E[X | F]\) is the orthogonal projection of \(X\) on \(L^2((\Omega, F, P))\). That is, for any \(F\)-measurable \(Y\) with \(E[Y^2] < \infty\):

\[\int (X - Y)^2 dP \geq \int (X - E[X | F])^2 dP = Proj_{(L^2((\Omega, F, P)))}(X)\]

In other words:

\[E[(X - Y)^2] \geq E[(X - E[X | F])^2]\]

with equality IFF \(Y = E[X | F]\).


Theorem 8.20: Jensen's Inequality for Conditional Expectation

Let \(I \in \mathbb{R}\) be an interval, let \(\psi: I \rightarrow \mathbb{R}\) be convex and let \(X\) be an \(I\)-valued random variable on \((\Omega, \mathbf{A}, P)\). Further, let \(E[|X|] < \infty\) and let \(F \subseteq \mathbf{A}\) be a \(\sigma\)-algebra. Then:

\[\infty \geq E[\psi(X) | F] \geq \psi (E[X | F])\]


Theorem 8.21:

let \(p \in [1, \infty)\) and let \(F \subseteq \mathbf{A}\) be a sub-\(\sigma\)-algebra. Then the map:

\[L^p ((\Omega, \mathbf{A}, P)) \rightarrow L^p((\Omega, F, P)), \quad X \mapsto E[X | F]\]

is a contraction mapping. (\(\|E[X | F]\|_p \leq \|X\|_p\)) and thus continuous. Hence, for \(X, X_1, .... \in L^p(\Omega, \mathbf{A}, P)\) with \(\lim_{n \rightarrow \infty}\|X_n - X\|_p = 0\):

\[\lim_{n \rightarrow \infty}\|E[X_n | F] - E[X | F]\|_p = 0\]


Definition 8.24:

Let \(Y \in L^1(P)\) and \(X: (\Omega, \mathbf{A}) \to (E, \varepsilon)\). We define the conditional expectation of \(Y\) given \(X = x\) by \(E[Y | X=x] := \psi(x)\), where \(\psi(X) = E[Y | X]\) is a \(\varepsilon-B(\mathbb{R})\) measurable function.

Analogously, define \(P(A | X=x) = E[\mathbb{I}_A | X = x], \; \forall A \in \mathbf{A}\)


Definition 8.25: Transition Kernel, Markove Kernel

Let \((\Omega_1, \mathbf{A}_1), (\Omega_2, \mathbf{A}_2)\) be measurable spaces. A map \(k: \Omega_1 \times \mathbf{A}_2 \rightarrow [0, \infty]\) is called a \(\sigma\)-finite transition kernel (from \(\Omega_1\) to \(\Omega_2\)) if:

  1. \(\omega_1 \mapsto k(\omega_1, A_2)\) is \(\mathbf{A}_1\)-measurable for any \(A_2 \in \mathbf{A}_2\).
  2. \(A_2 \mapsto k(\omega_1, A_2)\) is a \(\sigma\)-finite meausre on \((\Omega_2, \mathbf{A}_2)\) for any \(\omega_1 \in \Omega_1\).

if the second condition is a probability measure for all \(\omega_1 \in \Omega_1\), then \(k\) is called stochastic kernel or Markov kernel. If in the second condition, we also have \(k(\omega_2, \Omega_2) \leq 1\) for any \(\omega_1 \in \Omega_1\), then \(k\) is called sub-Markov.


Definition 8.28: Regular Conditional Distribution

Let \(Y\) be a random variable with values in a measurable space \((E, \varepsilon)\) and let \(F \subseteq \mathbf{A}\) be a sub-\(\sigma\)-algebra. A stochastic kernel \(k_{Y, F}\) from \((\Omega, F)\) is called a regular conditional distribution of \(Y\) given \(F\) if:

\[k_{Y, F} (\omega, B) = P(\{Y \in B\} | F) (\omega)\]

for \(P\)-almost all \(\omega \in \Omega\) and for all \(B \in \varepsilon\), that is, if:

\[\int \mathbb{I}_{B} (Y) \mathbb{I}_A dP = \int k_{Y, F} (\cdot, B) \mathbb{I}_A d P, \quad \forall A \in F, \; B \in \varepsilon\]

Consider a special case where \(F = \sigma(X)\) for a random variable \(X\) with its values in an arbitrary measurable space \((E^\prime, \varepsilon^\prime)\). Then the stochastic kernel:

\[(x, A) \mapsto k_{Y, X} (x, A) = P(\{Y \in A\} | X = x) = k_{Y, \sigma(X)} (X^{-1}(x), A)\]

is called a regular conditional distribution of \(Y\) given \(X\).


Theorem 8.29: Regular Conditional Distribution in \(\mathbb{R}\)

Let \(Y: (\Omega, \mathbf{A}) \to (\mathbb{R}, B(\mathbb{R}))\) be real-valued. Then there exists a regular conditional distribution \(k_{Y, F}\) of \(Y\) given \(F\).