From research questions to statistics

Lecture 3

Dr Milan Valášek

12 February 2021

1 / 32

Today

Conceptual, operational & statistical hypothesis
Null Hypothesis Significance testing
p-values

2 / 32

Hypothesis

Statement about something in the world
- Often in terms of differences or relationships between things/people/groups
Must be testable: it must be possible for the data to either support or disconfirm a hypothesis
Should be about a single thing

3 / 32

Levels of hypothesis

Conceptual: Expressed in normal language on the level of concepts/constructs
Operational: Restates a conceptual hypothesis in terms of how constructs are measured in a given study
Statistical: Translates an operational hypothesis into language of mathematics

4 / 32

Conceptual hypotheses

Expressed in normal language on the level of concepts/constructs

5 / 32

Conceptual hypotheses

Expressed in normal language on the level of concepts/constructs
Good hypothesis: "The recent observed rising trend in global temperatures on Earth is primarily driven by human-produced greenhouse gas emissions."

5 / 32

Conceptual hypotheses

Expressed in normal language on the level of concepts/constructs
Good hypothesis: "The recent observed rising trend in global temperatures on Earth is primarily driven by human-produced greenhouse gas emissions."
Bad hypothesis: "Homœopathic products can cure people, but sometimes they make them worse before they make them better, and the effect is only apparent subjectively with respect to some vague 'holistic' notions rather than a specific well-defined and testable set of criteria."

5 / 32

From research question to conceptual hypothesis

Let's say we're interested in factors predicting sport climbing performance
Research question: Are there morphological characteristics that predispose some people to be better at climbing?
We have a hunch that having relatively long arms might be beneficial
Conceptual hypothesis: Climbers have relatively longer arms than non-climbers

6 / 32

Operationalisation

To be able to formulate a hypothesis in statistical terms, we first need to get from the conceptual level to the level of measurement
Operationalisation is the process of defining variables in terms of how they are measured
- The concept of intelligence can be operationalised as total score on Raven's Progressive Matrices
- The concept of cognitive inhibition can be operationalised as (some measure of) performance on the Stroop test.

7 / 32

Example: The Ape Index

The ape index (AI) compares a person's arm span to their height
- Positive AI means, that your arm span is larger then your height
- 165 cm (5′5″) tall person with arm span of 167 cm has an ape index of +2
- Found to correlate with performance in some sports (e.g., climbing, swimming, basketball)

8 / 32

Ashima Shiraishi

155 cm tall

Ape index +10 cm

9 / 32

Operational hypotheses

Conceptual hypothesis: Climbers have relatively longer arms than non-climbers

Operational hypothesis: Elite climbers have, on average, a higher ape index than general population

10 / 32

Statistical hypotheses

Translation of operational hypothesis to the language of maths
Deals with specific values (or ranges of values) of population parameters
- Mean of a given population can be hypothesised do be of a given value
- We can hypothesise a difference in means between two populations

11 / 32

Statistical hypothesis

Conceptual hypothesis: Climbers have relatively longer arms than non-climbers

Operational hypothesis: Elite climbers have, on average, a higher ape index than general population

Statistical hypothesis: $μ_{AI_c l i m b} > μ_{AI_g e n}$

12 / 32

Remember

We are interested in population parameters
However, we cannot measure them
We can estimate them based on sample statistics

13 / 32

Testing hypotheses

So we measure a climber and a non-climber and compare them to test our hypothesis

14 / 32

Testing hypotheses

So we measure a climber and a non-climber and compare them to test our hypothesis
We find that the climber has a higher AI than the non-climber

14 / 32

Testing hypotheses

So we measure a climber and a non-climber and compare them to test our hypothesis
We find that the climber has a higher AI than the non-climber
Hypothesis confirmed; we happy

14 / 32

Testing hypotheses

So we measure a climber and a non-climber and compare them to test our hypothesis
We find that the climber has a higher AI than the non-climber
Hypothesis confirmed; we happy

We happy?

14 / 32

Testing hypotheses

So we measure a climber and a non-climber and compare them to test our hypothesis
We find that the climber has a higher AI than the non-climber
Hypothesis confirmed; we happy

We happy?

No, the individuals might not be representative of the populations

14 / 32

Problem with samples

We need to collect a larger sample
However, the principled problem remains: sample mean might not capture $μ$ accurately

15 / 32

The bigger, the better!

There are statistical fluctuations; they get less important as N get bigger
Means converge to the true value of μ as N increases
CIs get exponentially smaller with N; statistical power increases
False positives (and negatives!) happen

16 / 32

Decisions, decisions

How do we decide that a difference/effect in our sample actually exists in population?

17 / 32

Decisions, decisions

How do we decide that a difference/effect in our sample actually exists in population?
One possible way is using Null Hypothesis Significance Testing (NHST)

17 / 32

Decisions, decisions

How do we decide that a difference/effect in our sample actually exists in population?
One possible way is using Null Hypothesis Significance Testing (NHST)
- There is strong criticism of this approach

17 / 32

Decisions, decisions

How do we decide that a difference/effect in our sample actually exists in population?
One possible way is using Null Hypothesis Significance Testing (NHST)
- There is strong criticism of this approach
- It is, nonetheless, very widely used

17 / 32

Decisions, decisions

How do we decide that a difference/effect in our sample actually exists in population?
One possible way is using Null Hypothesis Significance Testing (NHST)
- There is strong criticism of this approach
- It is, nonetheless, very widely used
- Alternatives exist!

17 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)

18 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)
Formulate the null hypothesis

18 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)
Formulate the null hypothesis
Choose appropriate test statistic

18 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)
Formulate the null hypothesis
Choose appropriate test statistic
Define the probability distribution of the test statistic under the null hypothesis

18 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)
Formulate the null hypothesis
Choose appropriate test statistic
Define the probability distribution of the test statistic under the null hypothesis
Gather and analyse (enough) data: calculate sample test statistic

18 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)
Formulate the null hypothesis
Choose appropriate test statistic
Define the probability distribution of the test statistic under the null hypothesis
Gather and analyse (enough) data: calculate sample test statistic
Get the probability of the value you got under the null hypothesis

18 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)
Formulate the null hypothesis
Choose appropriate test statistic
Define the probability distribution of the test statistic under the null hypothesis
Gather and analyse (enough) data: calculate sample test statistic
Get the probability of the value you got under the null hypothesis
If the observed value is likely under the null, retain the null

18 / 32

NHST

Formulate a research hypothesis (from conceptual to statistical)
Formulate the null hypothesis
Choose appropriate test statistic
Define the probability distribution of the test statistic under the null hypothesis
Gather and analyse (enough) data: calculate sample test statistic
Get the probability of the value you got under the null hypothesis
If the observed value is likely under the null, retain the null
If it is unlikely under the null, reject the null in favour of research hypothesis, celebrate!

18 / 32

Hypotheses

Back to climbers and ape index
Rather than a directional hypotheses (climbers have longer arms than non-climbers), it's more useful to formulate hypothesis of some difference or effect
Statistical hypothesis: $μ_{AI_c l i m b} \neq μ_{AI_g e n}$

19 / 32

The null hypothesis

Negation of the statistical hypothesis
Very often about no difference/effect (but not necessarily)
Statistical (alternative) hypothesis: $H_{1} : μ_{AI_c l i m b} \neq μ_{AI_g e n}$
Null hypothesis: $H_{0} : μ_{AI_c l i m b} = μ_{AI_g e n}$

20 / 32

The null hypothesis

Negation of the statistical hypothesis
Very often about no difference/effect (but not necessarily)
Statistical (alternative) hypothesis: $H_{1} : μ_{AI_c l i m b} \neq μ_{AI_g e n}$
Null hypothesis: $H_{0} : μ_{AI_c l i m b} = μ_{AI_g e n}$
$H_{1}$ and $H_{0}$ represent alternative realities (like parallel universes!)
- One where there is a difference of effect
- One where there isn't one

20 / 32

The null hypothesis

Negation of the statistical hypothesis
Very often about no difference/effect (but not necessarily)
Statistical (alternative) hypothesis: $H_{1} : μ_{AI_c l i m b} \neq μ_{AI_g e n}$
Null hypothesis: $H_{0} : μ_{AI_c l i m b} = μ_{AI_g e n}$
$H_{1}$ and $H_{0}$ represent alternative realities (like parallel universes!)
- One where there is a difference of effect
- One where there isn't one

NHST is about deciding which one of the two realities we live in

20 / 32

Test statistic

Mathematical expressions of what we're measuring (difference, effect, relationship...)
There are many available test statistics, useful for different scenarios
For now, let's just take simple difference in means: $D = {\bar{AI}}_{c l i m b} - {\bar{AI}}_{g e n}$
If null hypothesis is true, we'd expect $D = 0$ , i.e., no difference between climbers' and non-climbers' AI

21 / 32

Distribution of test statistic under H₀

$H_{0}$ represents the world where there is no difference in average ape index between elite climbers and the general population
Even if true difference in population (Δ; delta) is zero, D can be non-zero in sample (here N = 30)
For simplicity, assume ${AI}_{g e n}$ is normally distributed in population with $μ = 0$ and $σ = 1$

22 / 32

Distribution of test statistic under H₀

$H_{0}$ represents the world where there is no difference in average ape index between elite climbers and the general population
Even if true difference in population (Δ; delta) is zero, D can be non-zero in sample (here N = 30)
For simplicity, assume ${AI}_{g e n}$ is normally distributed in population with $μ = 0$ and $σ = 1$

22 / 32

Distribution of test statistic under H₀

Expected value of D under H₀ is 0

23 / 32

Distribution of test statistic under H₀

Expected value of D under H₀ is 0
More often than not D will not be equal to 0 in sample

23 / 32

Distribution of test statistic under H₀

Expected value of D under H₀ is 0
More often than not D will not be equal to 0 in sample
Small departures from 0 are common, large ones are rare

23 / 32

Distribution of test statistic under H₀

Expected value of D under H₀ is 0
More often than not D will not be equal to 0 in sample
Small departures from 0 are common, large ones are rare
Distribution of test statistic is dependent on N!

23 / 32

Distribution of test statistic under alternative hypothesis

$H_{1}$ represents the world where there is a difference in average ape index between elite climbers and the general population
If $H_{1}$ is true, test statistics is not centred around zero
Sometimes, a null result can still be observed (false negative; Type II error)

24 / 32

Distribution of test statistic under alternative hypothesis

$H_{1}$ represents the world where there is a difference in average ape index between elite climbers and the general population
If $H_{1}$ is true, test statistics is not centred around zero
Sometimes, a null result can still be observed (false negative; Type II error)

24 / 32

Probability of test statistic under H₀

Once we know what the distribution of our test statistic is, we can assess the probability of getting any given observed value or a more extreme value of D

25 / 32

Gather data and calculate the test statistic

Say we collected AI measurements from 30 climbers and 30 non-climbers
We calculated the mean difference, D = 0.47

26 / 32

Gather data and calculate the test statistic

Say we collected AI measurements from 30 climbers and 30 non-climbers
We calculated the mean difference, D = 0.47

Calculate probability of observed statistic under H₀

26 / 32

The p-value

The p-value is the probability of getting a test statistic at least as extreme as the one observed if the null hypothesis is really true

27 / 32

The p-value

The p-value is the probability of getting a test statistic at least as extreme as the one observed if the null hypothesis is really true

Tells us how likely our data are if there is no difference/effect in population

27 / 32

The p-value

The p-value is the probability of getting a test statistic at least as extreme as the one observed if the null hypothesis is really true

Tells us how likely our data are if there is no difference/effect in population
Does not tell us the probability of H₀ or H₁ being true

27 / 32

The p-value

The p-value is the probability of getting a test statistic at least as extreme as the one observed if the null hypothesis is really true

Tells us how likely our data are if there is no difference/effect in population
Does not tell us the probability of H₀ or H₁ being true
Does not tell us the probability of our data happening "by chance alone"

27 / 32

Decision

So we have
- Data
- Test statistic
- Distribution of test statistic
- p(test_stat) under H₀

28 / 32

Decision

So we have
- Data
- Test statistic
- Distribution of test statistic
- p(test_stat) under H₀
What now?

28 / 32

Decision

So we have
- Data
- Test statistic
- Distribution of test statistic
- p(test_stat) under H₀
What now?
We reject H₀ and accept H₁ if we judge our result to be unlikely under H₀

28 / 32

Decision

So we have
- Data
- Test statistic
- Distribution of test statistic
- p(test_stat) under H₀
What now?
We reject H₀ and accept H₁ if we judge our result to be unlikely under H₀
We retain H₀ if we judge the result to be likely under it

28 / 32

How likely is likely enough?

This is an arbitrary choice!
Commonly used significance levels are
- 5% (.05; most common)
- 1% (.01)
- 0.1% (.001)
If p-value is less than our chosen significance level, we call the result statistically significant (sufficiently unlikely under H₀)

Significance level must be chosen before results are analysed!

29 / 32

What about the ape index?

30 / 32

What about the ape index?

We found a mean difference in AI between climbers and non-climbers of 0.47

30 / 32

What about the ape index?

We found a mean difference in AI between climbers and non-climbers of 0.47
This statistic has an associated p-value = .093

30 / 32

What about the ape index?

We found a mean difference in AI between climbers and non-climbers of 0.47
This statistic has an associated p-value = .093
Under the most common significance level in psychology (.05), this is not a statistically significant difference

30 / 32

What about the ape index?

We found a mean difference in AI between climbers and non-climbers of 0.47
This statistic has an associated p-value = .093
Under the most common significance level in psychology (.05), this is not a statistically significant difference
We thus retain the null hypothesis and report not having found a difference: our hypothesis was not supported by the data
The difference we observed is not big enough for us to dismiss the assumption that we live in the world of $H_{0}$

30 / 32

Take-home message

Hypotheses should be clearly formulated, testable, and operationalised
Statistical hypotheses are statements about values of some parameters
Null hypothesis (usually, parameter is equal to 0) is the one we test (in NHST framework)
We can only observe samples, but we are interested in populations
Due to sampling error, we can find a relationship in sample even if one doesn't exist in population
NHST is one way of deciding if sample result holds in population: understanding it is crucial!

The p-value is the probability of getting a test statistic at least as extreme as the one observed if the null hypothesis is really true

31 / 32

Have a lovely weekend :)32 / 32

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

From research questions to statistics

Lecture 3

Dr Milan Valášek

12 February 2021

Today

Hypothesis

Levels of hypothesis

Conceptual hypotheses

Conceptual hypotheses

Conceptual hypotheses

From research question to conceptual hypothesis

Operationalisation

Example: The Ape Index

Ashima Shiraishi

Operational hypotheses

Statistical hypotheses

Statistical hypothesis

Remember

Testing hypotheses

Testing hypotheses

Testing hypotheses

Testing hypotheses

Testing hypotheses

Problem with samples

The bigger, the better!

Decisions, decisions

Decisions, decisions

Decisions, decisions

Decisions, decisions

Decisions, decisions

NHST

NHST

NHST

NHST

NHST

NHST

NHST

NHST

Hypotheses

The null hypothesis

The null hypothesis

The null hypothesis

Test statistic

Distribution of test statistic under H0

Distribution of test statistic under H0

Distribution of test statistic under H0

Distribution of test statistic under H0

Distribution of test statistic under H0

Distribution of test statistic under H0

Distribution of test statistic under alternative hypothesis

Distribution of test statistic under alternative hypothesis

Probability of test statistic under H0

Gather data and calculate the test statistic

Gather data and calculate the test statistic

Calculate probability of observed statistic under H0

The p-value

The p-value

The p-value

The p-value

Decision

Decision

Decision

Decision

How likely is likely enough?

What about the ape index?

What about the ape index?

What about the ape index?

What about the ape index?

What about the ape index?

Take-home message

Have a lovely weekend :)

Today

Help

Distribution of test statistic under H₀

Distribution of test statistic under H₀

Distribution of test statistic under H₀

Distribution of test statistic under H₀

Distribution of test statistic under H₀

Distribution of test statistic under H₀

Probability of test statistic under H₀

Calculate probability of observed statistic under H₀