When analyzing real-world data—be it financial time series, engineering measurements, public health metrics, or e-commerce behaviors—we rarely focus on a single variable. More often, we observe multiple attributes or features at once: daily returns of several stocks, temperature and humidity at a weather station, or user behaviors across different pages on a website. Random vectors are the probability-theory construct that let us handle these variables jointly, ensuring we capture both individual behaviors and collective interactions among them.

Open Table of contents

1. Introduction: The Need for Random Vectors
- 1.1 Beyond One-Dimensional Analysis
- 1.2 Random Vectors in Formal Terms
2. Joint Distributions: The Full Multidimensional Story
3. Conditional Distributions
4. Independence
5. Extended Topics
6. Case Studies
7. Pitfalls
8. Conclusion

1. Introduction: The Need for Random Vectors

1.1 Beyond One-Dimensional Analysis

Traditionally, introductory probability courses start with a single random variable—describing phenomena like one measurement at a time (e.g., an individual’s height, or the outcome of a single die roll). However, real-world problems typically involve multiple measurements:

Finance: A portfolio might contain multiple assets whose returns we track over time. The daily or monthly return of each asset is a random variable, but we also need to understand how they move together (are they positively correlated, negatively correlated, or nearly independent?).
Healthcare: A patient’s blood pressure, cholesterol level, and blood glucose are distinct measurements, yet they can co-vary in interesting ways.
Marketing and E-Commerce: A user’s behavior might be summarized by page views, session duration, items clicked, or items purchased. Analyzing them independently ignores important cross-relationships that could predict future behavior.

In all these scenarios, we gather data across multiple variables, creating an inherently multidimensional space of possible outcomes. Random vectors formalize this concept by treating each “dimension” as a component of a larger probabilistic entity.

1.2 Random Vectors in Formal Terms

A random vector $\footnotesize \mathbf{X = (X_1, X_2, \dots, X_n)}$ is a function from a sample space $\footnotesize \mathbf{\Omega}$ into $\footnotesize \mathbf{\mathbb{R}^n}$ . Each outcome $\footnotesize \mathbf{\omega \in \Omega}$ maps to a point $\footnotesize \mathbf{x \in \mathbb{R}^n}$ . If we focus solely on $\footnotesize \mathbf{X_1}$ , that’s just one random variable, but the vector $\footnotesize \mathbf{X}$ lets us track all $\footnotesize \mathbf{n}$ variables simultaneously and preserve their interdependencies.

2. Joint Distributions: The Full Multidimensional Story

2.1 Defining the Joint Distribution

To describe how a random vector behaves, we use its joint distribution. In one dimension, we have a single PMF (probability mass function) or PDF (probability density function). In multiple dimensions, we have:

Joint CDF:
$F_X(x_1, x_2) = P(X_1 \le x_1, \; X_2 \le x_2)$
More generally,
$F_X(x_1, \dots, x_n) = P\bigl(X_1 \le x_1, \dots, X_n \le x_n\bigr).$
Joint PMF/PDF:
- Discrete: $\footnotesize \mathbf{P_{X_1, \dots, X_n}(x_1, \dots, x_n)}$ .
- Continuous: $\footnotesize \mathbf{f_{X_1, \dots, X_n}(x_1, \dots, x_n)}$ .

2.2 Marginal Distributions

From the joint distribution, derive marginals by summing/integrating out the other variables. For example:

f_{X_1}(x_1) = \int_{-\infty}^{\infty} f_{X_1, X_2}(x_1, x_2)\, dx_2

Marginals show individual behavior, but not co-movement.

2.3 Why Joint Distributions Matter

Capture correlations (stocks, health indicators).
Underlie multivariate regression, PCA, Bayesian networks.
Essential for simulation and generative models.

3. Conditional Distributions

3.1 Formula

Discrete:

P_{X_1 \mid X_2}(x_1 \mid x_2) = \frac{P_{X_1, X_2}(x_1, x_2)}{P_{X_2}(x_2)}

Continuous:

f_{X_1 \mid X_2}(x_1 \mid x_2) = \frac{f_{X_1, X_2}(x_1, x_2)}{f_{X_2}(x_2)}

3.2 Applications

Regression models: $\footnotesize \mathbf{E[Y \mid X]}$ .
Classification: $\footnotesize \mathbf{P(\text{Class} \mid \text{Features})}$ .

3.3 Chain Rule

P(x_1,\dots,x_n) = P(x_1) P(x_2 \mid x_1)\dots P(x_n \mid x_1,\dots,x_{n-1})

4. Independence

4.1 Definition

Independent if:

f_{X_1, X_2}(x_1, x_2) = f_{X_1}(x_1)\, f_{X_2}(x_2)

4.2 Pairwise vs Mutual Independence

Mutual independence is stronger than pairwise.

4.3 Independence vs Uncorrelatedness

Zero correlation ≠ independence. Independence demands full factorization.

4.4 Applications

Naive Bayes classifiers.
Simplified finance risk models.
Experimental design.

5. Extended Topics

Conditional Independence: independence given a third variable.
Correlation & Partial Correlation: controlling for confounders.
Graphical Models: Bayesian networks, Markov random fields.
Approximate Independence: Naive Bayes, PCA.

6. Case Studies

Medical Diagnosis: Blood test scores and disease status.
Finance: Portfolio correlation structures.
Quality Control: Multivariate control charts.

7. Pitfalls

Simpson’s Paradox.
Spurious correlations.
Overlooking nonlinear dependencies.

8. Conclusion

Random vectors and their distributions are foundational for multivariate analysis:

Handle multiple variables in $\footnotesize \mathbf{\mathbb{R}^n}$ .
Joint distributions capture relationships.
Conditional distributions update beliefs with info.
Independence simplifies but rarely holds perfectly.

By mastering these, you gain a robust toolkit for analyzing real-world multivariate data.

Check out my full article on Medium

Introduction to Random Vectors

Table of contents