2 Statistical Thinking

Why statistics exists?

Core idea: The world is uncertain.

The world is full of variation. Even when nothing “mystical” is happening, numbers don’t line up perfectly.

If you weigh yourself three times in a row, you might see 61.2, 61.5, 61.3 kg.
If you toss a coin 10 times, sometimes you get 4 heads, sometimes 7.
If you ask 5 friends how many hours they slept, you’ll probably get 5 different answers.

So: what is true?

Statistics is the tool we use to:

Describe the variation we see.
Decide whether differences are real or just random wiggles.
Predict what might happen next, knowing there is always uncertainty.

It is not about proving things absolutely true. It is about saying:

“Given what I saw, how confident am I about what is happening?”

Example

Imagine you want to know if a coin is fair. You toss it 10 times, get 7 heads.

Heads = 1, Tails = 0

set.seed(2025)

# Toss a fair coin 10 times
sample1 <- rbinom(10, 1, 0.5)
mean(sample1)  # proportion of heads

[1] 0.7

# Toss a fair coin 1000 times
sample2 <- rbinom(1000, 1, 0.5)
mean(sample2)

[1] 0.504

Does that mean the coin is unfair?

Not necessarily. The chance could easily give you 7 heads from a fair coin.

But if you tossed it 1,000 times and got 700 heads, now the story is different.

Statistics gives us a way to make that judgment systematically.

Every experiment reflects:

Biological variability (differences between individuals, microbial strains, environmental exposures).
Measurement error (instrument noise, sequencing errors).
Sampling error (you only observed a fraction of reality).

Mathematics solves for certainty. Statistics quantifies uncertainty so we can make decisions despite it.

Variation is everywhere. If we don’t use statistics, we risk:

Believing patterns that are just noise.
Missing patterns that are real but subtle.

That’s why statistics exists: not to give “exact truth,” but to help us make good decisions under uncertainty.

There is a bigger problem. We almost never have access to everything we care about. Instead, we only see a slice of the bigger world.

This brings us to the heart of statistical thinking: We want to learn about a population (the big world) using just a sample (the slice we observed).