Best guesses and uncertainty

Lecture 2

Dr Milan Valášek

05 February 2021

1 / 26

Today

Point estimates vs interval estimates
Confidence intervals
t-distribution

2 / 26

What stats is about (yet again)

We want to know about the world (population)
We can only get data from samples
We calculate statistics on samples and use them to estimate the values in population
Statistics is all about making inferences about populations based on samples
If we could measure the entire population, we wouldn't need stats!

3 / 26

Point estimates

You've heard of the sample mean, median, mode
These are all point estimates - single numbers that are our best guesses about corresponding population parameters
Measures of spread (SD, $σ^{2}$ , etc.) are also point estimates
Even relationships between variables can be expressed using point estimates

4 / 26

Point estimates

r = −.07

r = .752

5 / 26

Accuracy and uncertainty

Sample mean $\bar{x}$ is the best estimate $\hat{μ}$ of population mean but means of almost all samples differ from population mean $μ$
Same is true for any point estimate
SE of the mean expresses the uncertainty about the estimates of population parameters
SE can be calculated for other point estimates, not just the mean
We can quantify uncertainty around point estimates using interval estimates

6 / 26

Interval estimates

In addition to estimating a single value, we can also estimate an interval around it
e.g., mean = 4.13 with an interval from −0.2 to 8.46
Interval estimates communicate the uncertainty around point estimates
There are different kinds of interval estimates
- Important: confidence intervals

7 / 26

Confidence interval

We can use SE and the sampling distribution to calculate a confidence interval (CI) with a certain coverage, e.g., 90%, 95%, 99%...
For a 95% CI, 95% of these intervals around sample estimates will contain the value of the population parameter
Let’s see an example

8 / 26

Confidence interval

Population of circles of different sizes

9 / 26

Confidence interval

Sample from population, estimate mean size

10 / 26

Confidence interval

Calculate the 95% CI around the mean

11 / 26

Confidence interval

Lather, rinse, repeat...

12 / 26

Confidence interval

~5% don't contain population mean = 95% coverage

13 / 26

How is it made?

Easy if we know sampling distribution of the mean
95% of sampling distribution is within ±1.96 SE
95% CI around estimated population mean is mean ±1.96 SE

14 / 26

How is it made?

Sampling distribution of the mean is normal (as per CLT)
Middle 95% of the sample means lie within ±1.96 SE
We use the same 1.96 SE to construct 95% CI around mean

15 / 26

How is it made?

Sampling distribution is, however, not known!
It can be approximated using the t-distribution and s and N

16 / 26

t-distribution

Symmetrical, centred around 0
Its shape changes based on degrees of freedom
"Fat-tailed" when df = 1; identical to standard normal when df = $\infty$

17 / 26

t-distribution

As shape changes, so do proportions (unlike with normal)
In standard normal, middle 95% of data lie within ±1.96
In t-distribution, this critical value changes based on df

18 / 26

t-distribution

t-distribution pops up in many situations
Always has to do with estimating sampling distribution from a finite sample
How we calculate number of df changes based on context
- Often has to do with N, number of estimated parameters, or both
- In the case of sampling distribution of the mean, df = N − 1

19 / 26

Back to CI

95% CI around estimated population mean is mean ±1.96 SE if we know the exact shape of sampling distribution
- We don't know the shape so we approximate it using the t-distribution
We need to replace the 1.96 with the appropriate critical value for a given number of df
For N = 30, t_crit(df=29) = 2.05

## [1] 2.04523

20 / 26

Back to CI

95% CI around the mean for a sample of 30 is $\bar{x} \pm 2.05 \times S E$
$\hat{S E} = \frac{s}{\sqrt{N}}$
$95 % C I = M e a n \pm 2.05 \times \frac{s}{\sqrt{N}}$
To construct a 95% CI around our estimated mean, all we need is
- Estimated mean (i.e. sample mean, because $\hat{μ} = \bar{x}$ )
- Sample SD (s)
- N
- Critical value for a t-distribution with N − 1 df

21 / 26

CIs are useful

Width of the interval tells us about how much we can expect the mean of a different sample of the same size to vary from the one we got
There's a x% chance that any given x% CI contains the true population mean
CAVEAT: That's not the same as saying that there's a x% chance that the population mean lies within our x% CI!
CIs can be calculated for any point estimate, not just the mean!

22 / 26

Remember this?

r = −.07

r = .752

23 / 26

Remember this?

r = −.07; 95% CI [−.263, .128]

r = .752; 95% CI [.652, .827]

24 / 26

Take-home message

Our aim is to estimate unknown population characteristics based on samples
Point estimate is the best guess about a given population characteristic (parameter)
Estimation is inherently uncertain
- We cannot say with 100% certainty that our estimate is truly equal to the population parameter
Confidence intervals express this uncertainty
- The wider they are, the more uncertainty there is
- They have arbitrary coverage (often 50%, 90%, 95%, 99%)
CIs are constructed using the sampling distribution
- True sampling distribution is unknown, we can approximate it using the t-distribution with given degrees of freedom
CIs can be constructed for any point estimate
For a 95% CI, there is a 95% chance that any given CI contains the true population parameter

25 / 26

Have a lovely weekend :)26 / 26

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Best guesses and uncertainty

Lecture 2

Dr Milan Valášek

05 February 2021

Today

What stats is about (yet again)

Point estimates

Point estimates

Accuracy and uncertainty

Interval estimates

Confidence interval

Confidence interval

Confidence interval

Confidence interval

Confidence interval

Confidence interval

How is it made?

How is it made?

How is it made?

t-distribution

t-distribution

t-distribution

Back to CI

Back to CI

CIs are useful

Remember this?

Remember this?

Take-home message

Have a lovely weekend :)

Today

Help