Let $(\Omega, \mathcal H, \mathbb P)$ be a probability space and consider a random vector $\mathbf X: \Omega \rightarrow \mathbb R^N$. By calling $\mathbf X$ we assume that each coordinate $\mathbf X_n : \Omega \rightarrow \mathbb R$ is a random variable (measurable with respect to the Borel $\sigma$-algebra on $\mathbb R$). Equivalently $\mathbf X$ is measurable with respect to $\mathcal B(\mathbb R^N) = \mathcal B(\mathbb R)^{\otimes N}$.
We may define the joint distribution of $\mathbf X$ to be the measure on $\mathcal B(\mathbb R^N)$ given by $$\mu_{\mathbf X}(A) = \mathbb P\{\mathbf X \in A\}.$$ If $A_1 \times \cdots \times A_N$ is a rectangle in $\mathcal B(\mathbb R^N)$ then $$\mathbb P\{X_1 \in A_1, \ldots, X_N \in A_N\} = \int_{A_1} \cdots \int_{A_N} d\mu_{\mathbf X}.$$
The joint distribution function of $\mathbf X$ is given by $$F_{\mathbf X}(\mathbf x) = \mathbb P\{X_1 \leq x_1, \ldots, X_N \leq x_N\}.$$ If $F_{\mathbf X}$ is (first) differentiable with respect to $x_1, \ldots, x_N$, then the joint density function of $\mathbf X$ is defined to be $$f_{\mathbf X}(\mathbf x) = \frac{\partial}{\partial x_1} \cdots \frac{\partial}{\partial x_N} F_{\mathbf x}(\mathbf x).$$
Theorem: If $\mathbf X$ is a random vector with density function $f_{\mathbf X}$, Then,
- $\mathbb P\{\mathbf X \in A\} = \int_A f_{\mathbf X}(\mathbf x) \,dx_1 \cdots dx_N$
- If $g: \mathbb R^N \rightarrow \mathbb R$ is Borel measurable, then $$\mathbb E[g(X)] = \int_{\mathbb R^N} g(\mathbf x) f_{\mathbf X}(\mathbf x) \, dx_1 \cdots dx_N.$$
Marginal Distributions & Densities
If we are given the joint distribution, distribution function or density function of a random vector $\mathbf X$, then we can recover the distributions, distributions functions and density functions of the random variables $X_1, \ldots, X_N$. To distinguish these from the joint density, etc. we use the signifier marginal.
Definition:
- The marginal distribution of $X_1$ is the measure on $(\mathbb R, \mathcal B(\mathbb R))$ given by $\mu_{X_1}(A) = \mu_{\mathbf X}(A \times \mathbb R \times \cdots \times \mathbb R),$ The marginal distribution of the other random variables is found similarly.
- The marginal distribution of $\mathbf Y = (X_1, \ldots, X_n)$ is the measure on $(\mathbb R^n, \mathcal B(\mathbb R^n))$ given by $\mu_{\mathbf Y}(B) = \mu_{\mathbf X}(B \times \mathbb R^{N-n})$. The marginal distribution of other subvectors of $\mathbf X$ is found similarly.
- The marginal distribution function of $X_1$ (and the other $X_n$) by analogy) is given by $$F_{X_1}(x) = \mu_{X_1}(-\infty, x] = \mu_{\mathbf x}( (-\infty, x] \times \mathbb R^{N-1} ).$$
- The marginal distribution function of $\mathbf Y = (X_1, \ldots, X_n)$ is given by $$F_{\mathbf Y}(y_1, \ldots, y_n) = \mu_{\mathbf X}( (-\infty, y_1] \times \cdots \times (-\infty, y_n] \times \mathbb R^{N-n}).$$
- If $\mathbf X$ has joint density function $f_{\mathbf X}$ then $X_1$ has a density function called the marginal density function $f_{X_1}$ given by $$f_{X_1}(x_1) = \int_{\mathbb R^{N-1}} f_{\mathbf X}(x_1, x_2, \ldots, x_N) \, dx_2 \cdots dx_N.$$ The marginal density function for $X_n$ is found by integrating $f_{\mathbf X}$ over $\mathbb R$ for all variables except $x_n$.
- If $\mathbf X$ has joint density function $f_{\mathbf X}$ and $\mathbf Y = (X_1, \ldots, X_n)$ then $\mathbf Y$ has a joint density function $f_{\mathbf Y} = f_{X_1, \ldots, X_n}$ called the marginal density function and $$f_{\mathbf Y}(x_1, \ldots, x_n) = \int_{\mathbb R^{N-n}} f_{\mathbf X}(x_1, x_2, \ldots, x_n) \, dx_{n+1} \cdots dx_N.$$ The marginal density for other subvectors of $\mathbf X$ is determined by integrating $f_{\mathbf X}$ over $\mathbb R$ for each variable not appearing in $\mathbf Y$.
We can compute expectations of (functions) of the random variables $X_1, \ldots, X_N$ either via their joint distributions/densities or their marginal distributions/densities.
Theorem: If $g : \mathbb R \rightarrow \mathbb R$ is a Borel function, $f_{\mathbf X}$ is the joint distribution function of $\mathbf X$ and $f_{X_n}$ is the marginal distribution function for $X_n$, then $$\mathbb E[g(X_1)] = \int_{\mathbb R} g(x) f_{X_n}(x) \, dx = \int_{\mathbb R^N} g(x) f_{\mathbf X}(\mathbf x) \, dx_1 \cdots dx_N.$$ If $h: \mathbb R^{n} \rightarrow \mathbb R$ is Borel measurable and $\mathbf Y = (X_{i_1}, \ldots, X_{i_n})$ is a subvector of $\mathbf X$ then $$\mathbb E[g(\mathbf Y)] = \int_{\mathbb R^N} h(x_{i_1}, \ldots, x_{i_n}) f_{\mathbf X}(\mathbf x) \, dx_1 \cdots dx_N = \int_{\mathbb R^n} h(y_1, \ldots, y_n) f_{\mathbf Y}(y_1, \ldots, y_n) dy_1 \cdots dy_n.$$