Let $(\Omega, \mathcal H, \mathbb P)$ be a probability space, and suppose $\mathcal F \subseteq \mathcal H$ is a sub-$\sigma$-algebra of $\mathcal H$. Then, if $X$ is a random variable measurable with respect to $\mathcal F$, then it is measurable with respect to $\mathcal H$, but the converse is not true.

The central question that conditional expectation tries to answer is: Given $X$ measurable with respect to $\mathcal H$, can we find some $X_{\mathcal F}$ measurable with respect to $\mathcal F$ that is, in some sense the best approximation to $X$. Given our tools at hand, it makes sense to express the notion of approximating $X$ by $X_{\mathcal F}$ by using expectations.

**Definition:** If $X \in L^1(\mathcal H)$ then $\overline X \in L^1(\mathcal F)$ is called a *conditional expectation* of $X$ with respect to $\mathcal F$ if for every $E \in \mathcal H$, $\mathbb E[\boldsymbol 1_E X] = \mathbb E[\boldsymbol 1_E \overline X]$.

The expectation $\mathbb E[\boldsymbol 1_E X]$ is called the expectation of $X$ *over* $E$, and so $\overline X$ is a conditional expectation of $X$ exactly when their expectations agree over all events $E \in \mathcal F$. That is, when we average $X$ and $\overline X$ over any set in $\mathcal F$ we cannot distinguish them.

It turns out a conditional expectation of $X$ always exists, and is *almost* unique. Thus we abuse language and sometimes say $X_{\mathcal F} = \overline X$ is *the* conditional expectation of $X$.

#### Conditional Expectation of a Random Variables Given Another

One situation we may be interested in is the conditional expectation of a random variable $X$ given the $\sigma$-algebra $\sigma(Y)$ generated by random variable $Y$. In this case we usually just talk about the conditional expectation of $X$ given $Y$. When $X$ and $Y$ have a joint density, then we can compute conditional expectations $X$ given $\boldsymbol 1_{\{Y \in B\}}$ for Borel sets $B$. Notice that the $\sigma$-algebra of $\boldsymbol 1_{\{Y \in B\}}$ is explicitly given by $\{\emptyset, \{Y \in B\}, \{Y \in B^c\}, \Omega\}$, and so the conditional expectation will be a linear combination of at most two indicator functions (those for $\{Y \in B\}$ and $\{Y \in B^c\}$).

**Theorem:** Suppose $X$ and $Y$ have joint density $f_{X,Y}$ and $A, B$ are Borel sets with $\mathbb P(B) \not\in \{0,1\}$. Then if $g: \mathbb R \rightarrow \mathbb R$ is any Borel measurable function then $\mathbb E[g(X) | Y] := \mathbb E[g(X) | \sigma(Y)]$ is given by $$\boldsymbol1_{\{Y \in B\}} \int_{\mathbb R} \int_B g(x) f_{X|Y}(x, y) \, dx \, dy + \boldsymbol 1_{\{Y \in B^c\}} \int_{\mathbb R} \int_{B^c} f_{X|Y}(x, y) \, dx \, dy.$$