Modern probability and statistics often focus on scenarios where we aggregate multiple random variables, measure how these variables co-vary, and then draw conclusions about their long-run behavior. In this short post, we’ll introduce three key ideas:
- Summation of Random Variables
- Covariance and Correlation
- Limit Theorems
These concepts lie at the heart of statistical modeling and help explain why averages and aggregated statistics become so powerful in real-world applications.
1. Sums of Random Variables
1.1 Motivation
We frequently encounter sums of random variables in practical contexts—like the total number of defective parts in a factory batch (summing individual defect indicators), or the portfolio return from several stocks (summing daily returns). Understanding the distribution of these sums is crucial for risk analysis, planning, and decision-making.
1.2 Basic Ideas
- Discrete Convolution: If and are discrete random variables (and particularly if they’re independent), the distribution of is given by convolving their respective probability mass functions (PMFs):
- Continuous Convolution: In the continuous case (with independent and ), we integrate their probability density functions (PDFs):
1.3 Illustrative Examples
- Binomial + Binomial: If you sum two binomially distributed random variables (same , different ), you get another binomial with combined .
- Normal + Normal: If you sum two independent normal variables, the result is also normal, with its mean and variance being the sum of the individual means and variances.
2. Covariance and Correlation
2.1 Covariance
When working with multiple variables, understanding their co-movement is essential. Covariance between and is defined as
- If , and tend to move together.
- If , they typically move in opposite directions.
- If , there’s no linear association (though non-linear dependencies can still exist).
2.2 Correlation
Covariance is influenced by scale. Correlation standardizes this measure to a range :
- : Perfect positive linear relation.
- : Perfect negative linear relation.
- : No linear relation.
Correlation is widely used in finance (portfolio risk management) and data science (feature analysis).
3. Limit Theorems
3.1 Law of Large Numbers (LLN)
Suppose are i.i.d. with mean . The Law of Large Numbers says that as ,
in probability (Weak LLN) or almost surely (Strong LLN). This underpins why sample averages converge to the true mean.
3.2 Central Limit Theorem (CLT)
The Central Limit Theorem states that if are i.i.d. with finite mean and variance , then
as grows large. This explains why sums or averages of many independent random variables often look normal.
3.3 Practical Significance
- Confidence Intervals: CLT justifies many statistical inference techniques.
- Sampling: LLN assures convergence of sample means to population means.
- Robustness: Even with mild assumption violations, large yields good approximations.
Sums of random variables investigate aggregate quantities—vital in risk analysis, manufacturing, and finance. Covariance and correlation quantify how variables move together. Limit theorems assure that sums and averages follow predictable patterns, justifying much of modern statistical inference.
Together, these topics illuminate why and how simple aggregate statistics can capture the essence of complex processes, and how co-variation and long-run behavior form the foundation of scientific, business, and engineering decisions.
Check out my full article on Medium