+ - 0:00:00
Notes for current slide
Notes for next slide

End of the Line: Comparing models

Lecture 10

Dr Milan Valášek

23 April 2021

1 / 26

Stats wars

  • LM1:   A New Equation

  • LM2:   R2 strikes back

  • LM3:   Return of the yi

  • LM4:   F awakens

2 / 26

Today

  • Categorical predictors again

  • Multiple R2 and adjusted R2

  • Comparing linear models with the F-test

3 / 26

What you already know about the linear model

yi=b0+b1×x1i+0×x2i+0×x3i++0×xni+ei

  • Every predictor adds a dimension (axis)

  • Every b coefficient (except for b0) is a slope of the plane of best fit with respect to the dimension of the predictor

  • Predictors can be continuous or categorical

  • Outcome must be continuous

4 / 26

Last time: birth weight model

# centre predictors
bweight <- bweight %>%
mutate(gest_cntrd = Gestation - mean(Gestation, na.rm=TRUE),
age_cntrd = mage - mean(mage, na.rm=TRUE))
m_age_gest <- lm(Birthweight ~ age_cntrd + gest_cntrd, bweight)
summary(m_age_gest)
##
## Call:
## lm(formula = Birthweight ~ age_cntrd + gest_cntrd, data = bweight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77485 -0.35861 -0.00236 0.26948 0.96943
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.3128571 0.0674405 49.123 < 2e-16 ***
## age_cntrd -0.0007953 0.0120469 -0.066 0.948
## gest_cntrd 0.1618369 0.0258242 6.267 2.21e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4371 on 39 degrees of freedom
## Multiple R-squared: 0.5017, Adjusted R-squared: 0.4762
## F-statistic: 19.64 on 2 and 39 DF, p-value: 1.26e-06
5 / 26

Categorical predictor

  • Let's see if whether or not the mother is a smoker has an influence on the baby's birth weight

  • What do you think?

6 / 26

Categorical predictor

  • Let's see if whether or not the mother is a smoker has an influence on the baby's birth weight

  • What do you think?

  • Smoking is a binary categorical predictor but we can add it to the model just like any other variable!

bweight$smoker
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1
6 / 26

Effect of mother's smoking

m_smoke <- lm(Birthweight ~ smoker, bweight)
summary(m_smoke)
##
## Call:
## lm(formula = Birthweight ~ smoker, data = bweight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21409 -0.39159 0.02591 0.41935 1.43591
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5095 0.1298 27.040 <2e-16 ***
## smoker -0.3754 0.1793 -2.093 0.0427 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5804 on 40 degrees of freedom
## Multiple R-squared: 0.09874, Adjusted R-squared: 0.07621
## F-statistic: 4.382 on 1 and 40 DF, p-value: 0.0427
7 / 26

Effect of mother's smoking

m_smoke <- lm(Birthweight ~ smoker, bweight)
summary(m_smoke)
##
## Call:
## lm(formula = Birthweight ~ smoker, data = bweight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21409 -0.39159 0.02591 0.41935 1.43591
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5095 0.1298 27.040 <2e-16 ***
## smoker -0.3754 0.1793 -2.093 0.0427 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5804 on 40 degrees of freedom
## Multiple R-squared: 0.09874, Adjusted R-squared: 0.07621
## F-statistic: 4.382 on 1 and 40 DF, p-value: 0.0427

birthweighti=b00.38×smokeri+ei

  • Babies whose mothers smoke are, on average, 0.38 lbs lighter at birth than those whose mother do not smoke.
7 / 26

Continuous AND categorical predictors

  • Of course, you can include categorical predictors even in a larger model
m_age_gest_smoke <- update(m_age_gest, ~ . + smoker)
# that's the same as
m_age_gest_smoke <- lm(Birthweight ~ age_cntrd + gest_cntrd + smoker, bweight)
summary(m_age_gest_smoke)
##
## Call:
## lm(formula = Birthweight ~ age_cntrd + gest_cntrd + smoker, data = bweight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60724 -0.31521 0.00952 0.24838 1.08946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.475375 0.093843 37.034 < 2e-16 ***
## age_cntrd 0.005115 0.011668 0.438 0.6636
## gest_cntrd 0.156079 0.024552 6.357 1.84e-07 ***
## smoker -0.310261 0.131381 -2.362 0.0234 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4135 on 38 degrees of freedom
## Multiple R-squared: 0.5655, Adjusted R-squared: 0.5312
## F-statistic: 16.49 on 3 and 38 DF, p-value: 5.083e-07
8 / 26

Multiple R2

  • We talked about what R2 means in terms of explained and unexplained (residual) variance

 

9 / 26

Multiple R2

  • It's the same in multiple-predictor models:
# multplie R-squared from model
m_age_gest_smoke %>% broom::glance() %>% pull(r.squared)
## [1] 0.5655139
# same as manually-calculated using formula
1 - var(resid(m_age_gest_smoke))/var(bweight$Birthweight)
## [1] 0.5655139
  • It can be also interpreted as the square of the correlation between predicted and observed values of outcome
predicted <- fitted(m_age_gest_smoke)
correlation <- cor(predicted, bweight$Birthweight)
correlation^2
## [1] 0.5655139
10 / 26

The thing with multiple R2

  • We can also add more predictors to our models
  • Sometimes, they are good, sometimes, they are bad...
bweight <- bweight %>%
mutate(x4 = rnorm(nrow(bweight)),
x5 = rnorm(nrow(bweight)),
x6 = rnorm(nrow(bweight)))
# add 3 random predictor variables
m_big <- update(m_age_gest_smoke, ~ . + x4 + x5 + x6)
m_big %>% broom::tidy()
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.48 0.0936 37.2 1.02e-29
## 2 age_cntrd 0.00284 0.0116 0.244 8.09e- 1
## 3 gest_cntrd 0.145 0.0260 5.57 2.83e- 6
## 4 smoker -0.280 0.135 -2.08 4.53e- 2
## 5 x4 0.100 0.0745 1.35 1.86e- 1
## 6 x5 -0.0368 0.0754 -0.488 6.28e- 1
## 7 x6 -0.0784 0.0560 -1.40 1.71e- 1
11 / 26

The thing with multiple R2

  • Multiple R2 gets higher as we add more and more predictors, even if there is no relationship between them and the outcome
# original 3-predictor model
m_age_gest_smoke %>% broom::glance() %>% pull(r.squared)
## [1] 0.5655139
# model that includes rubbish predictors
m_big %>% broom::glance() %>% pull(r.squared)
## [1] 0.6135306
  • The new model explains additional ~5% of the variance in birth weight even though the predictors are just randomly generated

  • We want a metric of explained variance that doesn't get inflated by adding noise to the model

12 / 26

Adjusted R2

  • Adjusted R2 ( Radj2, or sometimes R¯2) introduces a penalty for each predictor added to the model

Radj2=1SSRnp1SSTn1

  • How does the penalty work?

    • The larger the number of predictors ( p ), the larger the numerator SSRnp1 gets

    • The larger the numerator, the larger the entire fraction (just like 3/4 is more than 2/4)

    • The larger the fraction, the smaller the adjusted R2

    • Penalty per predictor is larger in small samples

13 / 26

Adjusted R2

  • Compared to multiple R2, adjusted R2 gets far less inflated by including pointless predictors in the model
# original 3-predictor model
m_age_gest_smoke %>% broom::glance() %>%
dplyr::select(r.squared, adj.r.squared)
## # A tibble: 1 x 2
## r.squared adj.r.squared
## <dbl> <dbl>
## 1 0.566 0.531
# model that includes rubbish predictors
m_big %>% broom::glance() %>%
dplyr::select(r.squared, adj.r.squared)
## # A tibble: 1 x 2
## r.squared adj.r.squared
## <dbl> <dbl>
## 1 0.614 0.547
14 / 26

Adjusted R2

  • Adjusted R2 is an attempt at estimating proportion of variance explained by the model in the population

  • If adjusted R2 is always better, why even look at multiple R2?

15 / 26

Adjusted R2

  • Adjusted R2 is an attempt at estimating proportion of variance explained by the model in the population

  • If adjusted R2 is always better, why even look at multiple R2?

  • Large difference between the values of multiple R2 and adjusted R2 indicates that there are some unnecessary predictors in the model and it should be simplified

15 / 26

Adjusted R2

  • Adjusted R2 is an attempt at estimating proportion of variance explained by the model in the population

  • If adjusted R2 is always better, why even look at multiple R2?

  • Large difference between the values of multiple R2 and adjusted R2 indicates that there are some unnecessary predictors in the model and it should be simplified

  • The larger the difference between the two, the less likely it is that the model will generalise beyond the current sample

15 / 26

Model fit

  • We can take a model as a whole and assess whether it is a good fit to the data or not a good one

  • To do this we calculate the ratio of variance explained by the model (AKA model variance) and the variance not explained by the model (AKA residual variance)

  • This is the F-ratio

F=explained varianceunexplained variance=model varianceresidual variance=SSM/dfMSSR/dfR=(SSTSSR)/dfMSSR/dfR

16 / 26

F-distribution

  • F-ratio is a ratio of two sums of squares (model and residual, respectively), scaled by their respective degrees of freedom

  • We know that the sampling distribution of this kind of a metric is the F-distribution (hence the name)

17 / 26

F-test

  • Because we know the sampling distribution of our F-ratio, we can calculate the p-value associated with the given value of F under the null hypothesis that our model is no better than the null model
##
## Call:
## lm(formula = Birthweight ~ age_cntrd + gest_cntrd + smoker, data = bweight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60724 -0.31521 0.00952 0.24838 1.08946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.475375 0.093843 37.034 < 2e-16 ***
## age_cntrd 0.005115 0.011668 0.438 0.6636
## gest_cntrd 0.156079 0.024552 6.357 1.84e-07 ***
## smoker -0.310261 0.131381 -2.362 0.0234 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4135 on 38 degrees of freedom
## Multiple R-squared: 0.5655, Adjusted R-squared: 0.5312
## F-statistic: 16.49 on 3 and 38 DF, p-value: 5.083e-07
18 / 26

Comparing models

  • "Good" fit is a relative term - a model is a good fit in comparison to some other model
19 / 26

Comparing models

  • "Good" fit is a relative term - a model is a good fit in comparison to some other model

  • wHAt noW??

19 / 26

Comparing models

  • "Good" fit is a relative term - a model is a good fit in comparison to some other model

  • wHAt noW??

  • Yes, we're always comparing two models, even in the example above!

  • The total variance of a model* is just the residual variance of the model we're comparing it to!

  • F-test from summary() compares the model to the null (intercept-only) model

 

* The denominator of the ratio F=model variancetotal variance

19 / 26

Comparing models

My model brings all the x to the y
And F like "it's better than yours!"
Damn right, it's better than yours!
I can teach you but must come to the lectures, do the practicals and tutorials, and revise

~ G W Snedecor

  • The null hypothesis of the F-test is that the model is no better than the comparison model

  • A significant F-test tells us that our model is a statistically significantly better fit to the data than the comparison model

  • In other words, our model is an improvement on the previous model

  • It explains significantly* more variance in the outcome variable than the other model

 

 

*statistically significantly!!!

20 / 26

Comparing models

  • The anova() function in R allows us to compare the models explicitly
m_null <- lm(Birthweight ~ 1, bweight)
anova(m_null, m_age_gest_smoke)
## Analysis of Variance Table
##
## Model 1: Birthweight ~ 1
## Model 2: Birthweight ~ age_cntrd + gest_cntrd + smoker
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 41 14.9523
## 2 38 6.4965 3 8.4557 16.486 5.083e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • These are the same numbers as we got from summary() (or broom::glance())
ABCDEFGHIJ0123456789
statistic
<dbl>
df
<dbl>
df.residual
<int>
p.value
<dbl>
16.486553385.083237e-07
21 / 26

Comparing models

  • You can compare as many models as you like

  • They all have to be modelling the same outcome variable

  • They must all be nested

    • You must be able to sort them from the simplest to the most complex

    • A simpler model mustn't contain any predictors that a more complex model doesn't also contain

    • They must both be fitted to the same data (their respective Ns must be the same)

birthwieght ~ gestation

vs

birthwieght ~ gestation + mage

birthwieght ~ gestation

vs

birthwieght ~ smoker + mage

22 / 26

Take-home message

  • Categorical predictors can be entered in the linear model just like continuous predictors

    • For binary variables, the b represents the difference between the two categories

    • For more than 2 categories, it's a little more complicated (next term)

  • Multiple R2 can be interpreted as the proportion of variance in outcome explained by the model or the correlation between what the model predicts and what we observe

  • Multiple R2 gets inflated by adding more and more predictors

  • Adjusted R2 penalises for each added predictor to counteract this

    • Difference between the two is indicative of useless predictors in the model and a low generalisability of the model
23 / 26

Take-home message

  • We can compare nested models using the F-test

  • F-ratio is the ratio of variance explained by the model vs variance not explained by the model

  • We know that the sampling distribution of the F-ratio (AKA F-value, F-statistic) is the F-distribution with dfM and dfR degrees of freedom

  • The null hypothesis of the F-test is that the model is no better than the comparison model

  • F-test is always a comparison of two models, albeit sometimes implicitly

    • Total variance for a given model is just the residual variance of the comparison model!
24 / 26

Next time

  • Next week is our final Analysing Data lecture 😢

  • Revision session to recap all we talked about in all of this term's lectures

  • Don't forget to do the last tutorial that will let you practice comparing models

  • For final practical, Jennifer and I are planning something pretty special so do come!

25 / 26

Next time

  • Next week is our final Analysing Data lecture 😢

  • Revision session to recap all we talked about in all of this term's lectures

  • Don't forget to do the last tutorial that will let you practice comparing models

  • For final practical, Jennifer and I are planning something pretty special so do come!

    • It's still a practical, and no, it won't be delivered through the medium of interpretative dance
25 / 26

Next time

  • Next week is our final Analysing Data lecture 😢

  • Revision session to recap all we talked about in all of this term's lectures

  • Don't forget to do the last tutorial that will let you practice comparing models

  • For final practical, Jennifer and I are planning something pretty special so do come!

    • It's still a practical, and no, it won't be delivered through the medium of interpretative dance

    • We won't be singing either!

25 / 26

Have a lovely weekend :)

26 / 26

Stats wars

  • LM1:   A New Equation

  • LM2:   R2 strikes back

  • LM3:   Return of the yi

  • LM4:   F awakens

2 / 26
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow