+ - 0:00:00
Notes for current slide
Notes for next slide

The Linear Model 2: Evaluating the Model

Lecture 08

Dr Jennifer Mankin

19 March 2021

1 / 30

Looking Ahead (and Behind)

  • Week 4: Correlation

  • Week 5: Chi-Square ( χ2 )

  • Week 6: t-test

2 / 30

Looking Ahead (and Behind)

  • Week 4: Correlation

  • Week 5: Chi-Square ( χ2 )

  • Week 6: t-test

  • Last week: The Linear Model - Equation of a Line

  • This week: The Linear Model - Evaluating the Model

2 / 30

Announcements

3 / 30

Objectives

After this lecture you will understand:

  • The equation for a linear model with one predictor

    • b0 (the intercept)

    • b1 (the slope)

  • The logic of NHST for b-values

    • Interpreting p and CIs
  • How to assess model fit with R2

4 / 30

General Model Equation



outcome=model+error



  • We can use models to predict the outcome for a particular case

  • This is always subject to some degree of error

5 / 30

The Linear Model

  • The linear model predicts the outcome y based on a predictor x

    • General form: yi=b0+b1x1i+ei

    • b0: the intercept, or value of y when x is 0

    • b1: the slope, or change in y for every unit change in x

  • The slope b1 represents the relationship between the predictor and the outcome

6 / 30

Today's Example

  • "Does Hugging Provide Stress-Buffering Social Support? A Study of Susceptibility to Upper Respiratory Infection and Illness" (Cohen et al., 2015)

  • Participants completed questionnaires and phone interviews over 14 days

    • Including whether they had been hugged each day
7 / 30

Today's Example

  • "Does Hugging Provide Stress-Buffering Social Support? A Study of Susceptibility to Upper Respiratory Infection and Illness" (Cohen et al., 2015)

  • Participants completed questionnaires and phone interviews over 14 days

    • Including whether they had been hugged each day
  • Then exposed to a cold virus! 🤒

    • Measures of infection: amount of mucus, nasal clearing time
7 / 30

Today's Example

  • "Does Hugging Provide Stress-Buffering Social Support? A Study of Susceptibility to Upper Respiratory Infection and Illness" (Cohen et al., 2015)

  • Participants completed questionnaires and phone interviews over 14 days

    • Including whether they had been hugged each day
  • Then exposed to a cold virus! 🤒

    • Measures of infection: amount of mucus, nasal clearing time
  • Does receipt of hugs have a relationship with infection?

    • What kind of relationship might we predict? 🤔

GIF of Olaf from Frozen saying, "And I like warm hugs."

7 / 30

Operationalisation

  • Predictor: Percentage of days in which participants were hugged

    • Higher percentage = more hugs
  • Outcome: Nasal clearing time

    • A measure of congestion

    • Longer time = more congestion (= worse cold)

8 / 30

Operationalisation

  • Predictor: Percentage of days in which participants were hugged

    • Higher percentage = more hugs
  • Outcome: Nasal clearing time

    • A measure of congestion

    • Longer time = more congestion (= worse cold)

  • Model: Congestioni=b0+b1×Hugs1i+ei

    • Use the lm() function to estimate b0 and b1
8 / 30

Having a Look

cold_hugs %>%
mutate(pct_hugs = pct_hugs*100) %>%
ggplot(aes(x = pct_hugs, y = post_nasal_clear_log)) +
geom_point(position = "jitter", alpha = .4) +
scale_x_continuous(name = "Percentage of Days with Hugs") +
scale_y_continuous("Congestion (Log)") +
cowplot::theme_cowplot()
9 / 30

Having a Look

cold_hugs %>%
mutate(pct_hugs = pct_hugs*100) %>%
ggplot(aes(x = pct_hugs, y = post_nasal_clear_log)) +
geom_point(position = "jitter", alpha = .4) +
scale_x_continuous(name = "Percentage of Days with Hugs") +
scale_y_continuous("Congestion (Log)") +
geom_smooth(method = "lm") +
cowplot::theme_cowplot()
10 / 30

Having a Look

cold_hugs %>%
mutate(pct_hugs = pct_hugs*100) %>%
ggplot(aes(x = pct_hugs, y = post_nasal_clear_log)) +
geom_point(position = "jitter", alpha = .4) +
scale_x_continuous(name = "Percentage of Days with Hugs") +
scale_y_continuous("Congestion (Log)") +
geom_smooth(method = "lm") +
cowplot::theme_cowplot()
  • Very slight negative relationship

    • How can we interpret this relationship?
10 / 30

Creating the Model

##
## Call:
## lm(formula = post_nasal_clear_log ~ pct_hugs, data = cold_hugs)
##
## Coefficients:
## (Intercept) pct_hugs
## 0.5952 -0.1077
  • For every unit increase in hugs, congestion changes by -0.11

    • Here, "unit increase" = 1%

    • So, congestion goes down by 0.11 for every 1% increase in hugs

Model: Congestioni=0.600.11×Hugsi

11 / 30

The Story So Far

  • Investigating whether hugs protect against colds

  • Linear model shows that more hugs are associated with less congestion (infection)

    • Congestioni=0.600.11×Hugsi
  • Is this model any good? What do we mean by "good"?

12 / 30

The Story So Far

  • Investigating whether hugs protect against colds

  • Linear model shows that more hugs are associated with less congestion (infection)

    • Congestioni=0.600.11×Hugsi
  • Is this model any good? What do we mean by "good"?

    • Captures a relationship that may in fact exist: significance and CIs of b1
12 / 30

The Story So Far

  • Investigating whether hugs protect against colds

  • Linear model shows that more hugs are associated with less congestion (infection)

    • Congestioni=0.600.11×Hugsi
  • Is this model any good? What do we mean by "good"?

    • Captures a relationship that may in fact exist: significance and CIs of b1

    • Explains the variance in the outcome: R2 for the model

12 / 30

NHST for LM

  • b1 quantifies the relationship between the predictor and the outcome

    • The effect of interest and the key part of the linear model!

    • Our estimate of the true relationship in the population (the model parameter)

  • So...is this value significant?

13 / 30

NHST for LM

  • Our recipe for significance testing is:

    • Data

    • A test statistic

    • The distribution of that test statistic under the null hypothesis

    • The probability p of finding a test statistic as large as the one we have (or larger) if the null hypothesis is true

  • First, we need to sort out the null hypothesis of b1

14 / 30

Null Hypothesis of b1

  • b1 captures the relationship between variables

    • How much the outcome y changes for each unit change in x

    • Null hypothesis: the outcome y does not change when x changes

  • What would this look like in terms of the linear model? 🤔

15 / 30

Null Hypothesis of b1

  • Congestioni=b0+0×Hugs1i+ei

    • This is the null or intercept-only model

cold_hugs %>%
mutate(pct_hugs = pct_hugs*100) %>%
ggplot(aes(x = pct_hugs, y = post_nasal_clear_log)) +
geom_point(position = "jitter", alpha = .4) +
scale_x_continuous(name = "Percentage of Days with Hugs") +
scale_y_continuous("Congestion (Log)") +
geom_smooth(method = "lm") +
geom_hline(yintercept = mean(cold_hugs$post_nasal_clear_log, na.rm = T),
linetype = "dashed", colour = "purple3")+
cowplot::theme_cowplot()
16 / 30

Significance of b1

  • b1 = 0 represents the null hypothesis

    • So, the alternative hypothesis is b1 0
  • For our model, does b1 = 0?

17 / 30

Significance of b1

  • b1 = 0 represents the null hypothesis

    • So, the alternative hypothesis is b1 0
  • For our model, does b1 = 0?

    • No! Here b1 = -0.11
  • Is our estimate of b1 different enough from 0 to believe that it may actually not be 0 in the population?

    • Compare the estimate of b1 to the variation in estimates of b1
17 / 30

Significance of b1

  • Signal-to-noise ratio

    • Signal: the estimate of b1

    • Noise: the standard error of b1

  • Scale b1 by its standard error: b1SEb1

  • What do you get when you divide a normally distributed value by its standard error...???? 🤔

18 / 30

That's the t!

GIF meme of Kermit the Frog drinking tea

19 / 30

Significance of b1

  • b1SEb1=t

    • Compare our value of t to the t-distribution to get p, just as we've seen before

    • If p is smaller than our chosen alpha level, our predictor is considered to be significant

20 / 30

Significance of b1

  • b1SEb1=t

    • Compare our value of t to the t-distribution to get p, just as we've seen before

    • If p is smaller than our chosen alpha level, our predictor is considered to be significant

Term b SEb t p
Intercept 0.60 0.04 16.04 < .001
Percentage of Days with Hugs -0.11 0.05 -2.05 .041
20 / 30

Confidence Intervals for b1

  • Give us the range of likely sample estimates of β1 from other samples

    • Only if our interval is one of the 95% of intervals that does in fact contain the population value!

    • Review Lecture 2 for more on CIs

21 / 30

Confidence Intervals for b1

  • Give us the range of likely sample estimates of β1 from other samples

    • Only if our interval is one of the 95% of intervals that does in fact contain the population value!

    • Review Lecture 2 for more on CIs

  • Key info: does the confidence interval cross or include 0?

    • If yes, it's likely that we could have gathered a sample where b1 was 0
21 / 30

Confidence Intervals for b1

  • What can we conclude from these confidence intervals? 🤔
Term b SEb t p CIupper CIlower
Intercept 0.60 0.04 16.04 < .001 0.522 0.668
Percentage of Days with Hugs -0.11 0.05 -2.05 .041 -0.211 -0.005
22 / 30

Interim Summary

  • The key element of the linear model is b1

    • Quantifies the relationship between the predictor and the outcome

    • Null hypothesis: b1 = 0

    • Alternative hypothesis: b1 0

  • Is b1 different from 0?

    • Significance via t

    • Confidence intervals

23 / 30

A Good Model

  • Captures a relationship that does in fact exist

    • Isn't just noise (random variation)

    • Quantified with significance/CIs

24 / 30

A Good Model

  • Captures a relationship that does in fact exist

    • Isn't just noise (random variation)

    • Quantified with significance/CIs

  • Is useful for understanding the outcome variable

    • Explains variance in the outcome

    • Quantified with R2

24 / 30

Explaining Variance

  • We want to explain variance, particularly in the outcome

  • Goodness of Fit: How well does the model fit the data?

    • Better fit = model is better able to explain the outcome
  • So, how do we quantify model fit?

25 / 30

Goodness of Fit with R2

26 / 30

Goodness of Fit with R2

  • R2=Variance explained by the modelTotal variance

    • Interpret as a percentage of variance explained

    • Applies to our sample only

    • Larger value means better fit

  • Adjusted R2: estimate of R2 in the population

27 / 30

Goodness of Fit with R2

  • R2=Variance explained by the modelTotal variance

    • Interpret as a percentage of variance explained

    • Applies to our sample only

    • Larger value means better fit

  • Adjusted R2: estimate of R2 in the population

  • How does this value look? 🤔



R2 Adjusted R2 F p
0.01 0.01 4.22 .041
27 / 30

Putting It All Together

hugs_lm %>% summary()
##
## Call:
## lm(formula = post_nasal_clear_log ~ pct_hugs, data = cold_hugs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59522 -0.27880 0.01766 0.25219 0.79262
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.59522 0.03710 16.043 <0.0000000000000002 ***
## pct_hugs -0.10773 0.05243 -2.055 0.0405 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3572 on 403 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01037, Adjusted R-squared: 0.007912
## F-statistic: 4.222 on 1 and 403 DF, p-value: 0.04055
28 / 30

Summary

  • The linear model (LM) expresses the relationship between at least one predictor, x, and an outcome, y^

    • Linear model equation: yi=b0+b1x1i+ei

    • Most important result is the parameter b1, which expresses the change in y for each unit change in x

  • Evaluating the model

    • Is it unlikely that b1 isn't 0? Significance tests and CIs

    • How well does the model fit the data? R2 and adjusted R2

29 / 30






Have a lovely weekend!

30 / 30

Looking Ahead (and Behind)

  • Week 4: Correlation

  • Week 5: Chi-Square ( χ2 )

  • Week 6: t-test

2 / 30
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow