Probability theory is a foundational element of statistics, data science, and numerous applied mathematics fields. At its core are mass functions, distribution functions, and density functions, which collectively describe the behavior of random variables. This article provides a comprehensive exploration of these functions, their properties, and their real-world applications.
1. Random Variables: An Introduction
A random variable is a numerical outcome of a random process or experiment, serving as a link between probabilistic concepts and real-world observations. Random variables are broadly classified into:
1.1 Discrete Random Variables
Discrete random variables take on a countable set of distinct values. For example, the number of heads obtained in a series of coin flips is a discrete random variable.
1.2 Continuous Random Variables
Continuous random variables can assume an infinite number of values within a given range. For instance, the height of individuals in a population is modeled as a continuous random variable.
To characterize random variables, three essential functions are used: the probability mass function (PMF), the cumulative distribution function (CDF), and the probability density function (PDF). These functions will be discussed in detail below.
2. Probability Mass Function (PMF)
The probability mass function (PMF) quantifies the probability of each possible value of a discrete random variable. It is defined as:
\[ P(X = x) = f_X(x),\quad \text{for } x \in \text{range}(X), \]
where:
\( X \): The discrete random variable.
\( f_X(x) \): The PMF of \( X \).
\( P(X = x) \): The probability that \( X \) equals \( x \).
2.1 Properties of the PMF
Non-Negativity: \( f_X(x) \geq 0 \) for all \( x \).
Normalization: \( \sum_{x \in \text{Range}(X)} f_X(x) = 1 \).
Support: The PMF is defined only for values within the range of \( X\).
2.2 Example: Rolling a Die
Consider a fair six-sided die. The random variable \( X \) represents the outcome of the roll. The PMF is:
\[ f_X(x) = \begin{cases} \frac{1}{6}, & x \in \{1, 2, 3, 4, 5, 6\}, \\0, & \text{otherwise}. \end{cases} \]
3. Cumulative Distribution Function (CDF)
The cumulative distribution function (CDF) describes the probability that a random variable \( X \) is less than or equal to a certain value \( x \). It is defined as:
\[ F_X(x) = P(X \leq x). \]
3.1 CDF for Discrete Random Variables
For discrete random variables:
\[ F_X(x) = \sum_{t \leq x} f_X(t). \]
3.2 CDF for Continuous Random Variables
For continuous random variables:
\[ F_X(x) = \int_{-\infty}^x f_X(t)\, dt. \]
3.3 Properties of the CDF
Monotonicity: \( F_X(x) \) is a non-decreasing function.
Limits: \( \lim_{x \to -\infty} F_X(x) = 0 \) and \( \lim_{x \to \infty} F_X(x) = 1 \).
Right-Continuity: The CDF is continuous from the right.
3.4 Example: Rolling a Die
For the six-sided die example, the CDF is:
\[ F_X(x) = \begin{cases} 0, & x < 1, \\\frac{1}{6}, & 1 \leq x < 2, \\\frac{2}{6}, & 2 \leq x < 3, \\\frac{3}{6}, & 3 \leq x < 4, \\\frac{4}{6}, & 4 \leq x < 5, \\\frac{5}{6}, & 5 \leq x < 6, \\1, & x \geq 6. \end{cases} \]
4. Probability Density Function (PDF)
The probability density function (PDF) describes the likelihood of a continuous random variable taking on a particular value. Unlike the PMF, the PDF does not give probabilities directly but instead indicates the density of probabilities across a range of values.
The relationship between the PDF \( f_X(x) \) and the CDF \( F_X(x) \) is:
\[ F_X(x) = \int_{-\infty}^x f_X(t)\, dt,\quad f_X(x) = \frac{d}{dx} F_X(x). \]
4.1 Properties of the PDF
Non-Negativity: \( f_X(x) \geq 0 \) for all \( x \).
Normalization: \( \int_{-\infty}^\infty f_X(x)\, dx = 1 \).
Support: The PDF is defined over the entire range of \( X \).
4.2 Example: Uniform Distribution
Consider a random variable \( X \) uniformly distributed over \([a, b]\). The PDF is:
\[ f_X(x) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b, \\0, & \text{otherwise}. \end{cases} \]
5. Relationships Between PMF, CDF, and PDF
The PMF, CDF, and PDF provide complementary perspectives on random variables:
5.1 Discrete Case
The PMF \( f_X(x) \) gives probabilities for specific values.
The CDF \( F_X(x) \) is the cumulative sum of the PMF up to \( x \).
5.2 Continuous Case
The PDF \( f_X(x) \) describes the density of probabilities.
The CDF \( F_X(x) \) is the integral of the PDF.
5.3 Key Transitions
Discrete: \( F_X(x) = \sum_{t \leq x} f_X(t) \).
Continuous: \( F_X(x) = \int_{-\infty}^x f_X(t)\, dt \).
6. Moments and Expectation
Moments provide insights into the distribution of a random variable. Common moments include:
6.1 Expected Value (Mean)
Discrete: \( E[X] = \sum_{x \in \text{Range}(X)} x f_X(x) \).
Continuous: \( E[X] = \int_{-\infty}^\infty x f_X(x)\, dx \).
6.2 Variance
\[ \text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2. \]
6.3 Higher-Order Moments
Skewness: Measures the asymmetry of the distribution.
Kurtosis: Measures the “tailedness” of the distribution.
7. Real-World Applications
7.1 Discrete Random Variables
Modeling the number of customer arrivals at a store.
Describing outcomes in games of chance (e.g., dice rolls).
7.2 Continuous Random Variables
Analyzing sensor data in engineering.
Modeling time-to-failure in reliability studies.
7.3 Combined Applications
- Mixed models for scenarios involving both discrete and continuous elements.
Understanding mass, distribution, and density functions is essential for analyzing random variables. These functions form a framework for describing and interpreting random processes in both theoretical and practical contexts. By mastering these concepts, one gains a powerful toolset for exploring the stochastic nature of the world and applying probabilistic models across diverse fields.
Check out my full article on Medium