Table of Contents

Central Limit Theorem: Definition, Formula & Examples

Written by Paul Boyce Posted in Education
Last Updated May 23, 2023

What is the Central Limit Theorem?

Statistics play a pivotal role in our lives, often in ways we might not consciously recognize. From predicting weather patterns to understanding consumer behavior, the principles of statistics offer invaluable insights. Among these principles, the Central Limit Theorem (CLT) holds a special place due to its fundamental implications in the realm of probability and statistics.

The Central Limit Theorem, a cornerstone of probability theory, provides a bridge between sample statistics and population parameters, greatly simplifying complex statistical problems. In essence, the CLT states that the distribution of the sum (or average) of a large number of independent, identically distributed variables approaches a normal distribution, regardless of the shape of the original distribution.

This seemingly simple concept has profound implications, underpinning many statistical methods, including hypothesis testing, confidence intervals, and regression analysis. The CLT allows us to make meaningful inferences about unknown population parameters based on sample data, a process ubiquitous in fields ranging from social sciences to finance to physics.

Key Points

The central limit theorem is a fundamental concept in statistics that applies to the distribution of sample means or sums.
It states that as the sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the shape of the population distribution.
The central limit theorem holds under certain assumptions, including random sampling, independence between observations, and a sufficiently large sample size.

Understanding the Central Limit Theorem

The Central Limit Theorem (CLT) is a fundamental concept in statistics that establishes the foundation for inferential statistics. Its primary assertion is that the sum or the average of a large number of independent and identically distributed random variables, regardless of the shape of their distribution, will approximate a normal distribution. Here’s a bit more detailed look into the CLT.

1. The Key Components:

The Central Limit Theorem involves several components

Independent and Identically Distributed (i.i.d) Random Variables: This refers to the collection of random variables, all having the same probability distribution and each one not affecting or being affected by others.
Sum or Average: The theorem can be applied to both the sum and the average of the variables, as the average is simply the sum of the variables divided by the number of variables.
Normal Distribution: Also known as the Gaussian distribution, the normal distribution is a type of continuous probability distribution for a real-valued random variable. It is symmetric and is described by its mean and standard deviation.

2. The Statement of the Theorem

More formally, the CLT states that if we have a population with mean μ and standard deviation σ, and we take sufficiently large random samples from the population with replacement, then the distribution of the sample means will approach a normal distribution with mean (expected value) equal to the population mean μ and standard deviation equal to σ/sqrt(n), where n is the number of observations in the sample.

3. The Implications of the Theorem

The primary implication of the Central Limit Theorem is that it allows us to make inferences about the population from our sample. This is particularly useful in real-world scenarios where it is impractical or impossible to collect data from the entire population.

It also justifies the widespread use of the normal distribution in a wide range of statistical techniques including confidence intervals, hypothesis tests, and linear regression.

In the next sections, we will demonstrate the Central Limit Theorem with an example and discuss its applications and limitations.

Importance of the Central Limit Theorem

The Central Limit Theorem (CLT) is one of the most fundamental and powerful concepts in statistics and probability theory, playing a crucial role in many aspects of economic analysis and data science. Its importance is derived from several factors:

Simplification of Analysis: The CLT provides the foundation for many statistical methods, including confidence intervals and hypothesis testing. Even if a population doesn’t follow a normal distribution, the CLT allows us to assume that the sample means will follow a normal distribution when the sample size is sufficiently large, simplifying data analysis.
Broad Applicability: It applies to many random variables, regardless of the shape of their probability distribution. This universality allows researchers across various fields, including economics, to use techniques that rely on normal distributions.
Basis for Inferential Statistics: The CLT allows us to make inferences about population parameters based on sample data. This is particularly valuable in economics, where obtaining data for an entire population is often impractical or impossible.
Supports Large-Scale Analysis: In the era of big data, where datasets can be enormous, the CLT is even more significant. It justifies the use of large-scale data analysis techniques, ensuring that the insights gained from these large datasets are reliable and valid.
Facilitates Predictive Modeling: The CLT is an integral component in the creation of predictive models, including in econometrics and machine learning. By ensuring that error terms follow a normal distribution, the CLT allows these models to make accurate predictions.

In conclusion, the Central Limit Theorem is a cornerstone of statistical theory, playing a critical role in enabling rigorous, quantitative research in economics and many other fields. Its ability to simplify complex data sets into more digestible forms is its greatest strength, making it an essential tool in any data-driven field.

The Formula of the Central Limit Theorem

The Central Limit Theorem (CLT) doesn’t have a formula in the same way a mathematical equation does, as it’s more of a theoretical concept that outlines how distribution shapes behave under certain conditions. However, the theorem does provide a precise characterization of the distribution of sum (or average) of independent and identically distributed random variables.

Here’s how we usually express the concept in terms of statistics:

If we have a population with a mean (μ) and standard deviation (σ), and we randomly sample n observations from this population, then when n is large enough, the distribution of the sample means will approximate a normal distribution. The mean (expected value) of this distribution will be equal to the population mean, and the standard deviation (often referred to as the standard error when dealing with sample means) will be equal to the population standard deviation divided by the square root of the sample size.

Expressed in a formulaic manner, it looks like this:

Mean of sample means (μx̄) = μ (population mean)

Standard deviation of sample means (σx̄ or the standard error, SE) = σ/√n

Where:

μx̄ is the mean of the sample means
μ is the population mean
σx̄ (or SE) is the standard deviation (or standard error) of the sample means
σ is the population standard deviation
n is the sample size

It’s important to note that the theorem becomes more accurate as n grows larger. While there’s no definitive rule for what constitutes a “large enough” sample, n > 30 is a commonly used rule of thumb in many applications of the theorem.

Examples of the Central Limit Theorem

To best understand the Central Limit Theorem (CLT), let’s illustrate it with an example.

Example: Dice Rolls

Suppose we have a fair six-sided die, which we roll once. The possible outcomes are the numbers 1 through 6, each with an equal probability of 1/6. The population mean (μ) is 3.5, and the population standard deviation (σ) is approximately 1.71.

If we roll the die many times and take the average (or sum) of the rolls, the CLT tells us that this average will tend towards a normal distribution. Let’s see this with simulations for different sample sizes:

If we roll the die 5 times (n=5), calculate the average result, and repeat this process a large number of times, we might find that the distribution of the averages is not perfectly normal. It may be skewed, or have larger tails than a normal distribution.
However, if we increase the sample size to 30 (n=30), and repeat the process, we will find that the distribution of averages becomes much more bell-shaped. It becomes even closer to a normal distribution with a mean of 3.5 and a standard deviation of σ/√n = 1.71/√30 = 0.31.
As we continue to increase the sample size (n=100, n=1000, and so on), the distribution of the averages becomes closer and closer to a normal distribution with mean μ and standard deviation σ/√n.

This example illustrates the essence of the Central Limit Theorem: even though the original distribution was uniform (all outcomes equally likely), the distribution of averages of a large number of independent and identically distributed variables (the dice rolls) tends towards a normal distribution.

This fundamental principle is what allows us to apply statistical techniques to real-world data, even when we don’t know the exact distribution of the underlying population. The Central Limit Theorem ensures that we can assume the distribution of sample means to be normal, allowing us to make inferences about the population.

FAQs

What is the central limit theorem?

The central limit theorem is a statistical concept that describes the behavior of the sampling distribution of the sample mean as the sample size increases.

What does the central limit theorem state?

The central limit theorem states that when independent random variables are summed or averaged, their distribution tends to become approximately normal, regardless of the shape of the original distribution.

Why is the central limit theorem important?

The central limit theorem is important because it allows us to make statistical inferences about a population based on sample data, even if the population distribution is not known.

What are the assumptions of the central limit theorem?

The central limit theorem assumes that the random variables are independent, have finite variances, and that the sample size is sufficiently large.

About Paul

Paul Boyce is an economics editor with over 10 years experience in the industry. Currently working as a consultant within the financial services sector, Paul is the CEO and chief editor of BoyceWire. He has written publications for FEE, the Mises Institute, and many others.