class: center, middle, inverse, title-slide # The Linear Model 3: Multiple predictors ## Lecture 09 ### Dr Milan Valášek ### 26 March 2021 --- exclude: ![:live] class: center starwars .fade[ ] .crawl[ # Stats wars .cntr-li[ - **LM1:** A New Line - **LM2:** <i>R</i><sup>2</sup> Strikes Back - **LM3:** .secondary[Return of the <i>y<sub>i</sub></i>] - **LM4:** <i>F</i> Awakens ] ] --- exclude: ![:notLive] # Stats wars - **LM1:** A New Equation - **LM2:** <i>R</i><sup>2</sup> strikes back - **LM3:** .secondary[Return of the <i>y<sub>i</sub></i>] - **LM4:** <i>F</i> awakens --- <script type="text/javascript"> <!-- message to display at the bottom of slides when ?live=true --> slideMessage = "<span>Ask questions at </span><a href = 'pollev.com/milanvalasek890'>pollev.com/milanvalasek890</a>" plotsToPanels() live() </script> ## Today - Extending the linear model - Multiple predictors - Transforming variables in a model - Mean-centring - Scaling - *z*-transforming --- ## Basic linear model `$$\text{outcome}_{\text{obs}} = \text{intercept} + \text{slope} \times \text{predictor}_{\text{obs}} + \text{residual}_{\text{obs}}$$` .large[ `$$y_i = b_0 + b_1\times x_{1_i} + e_i$$` ] - The model is a line through the scatter of data - The line shows what the value of outcome for a given value of predictor *should* be according to the model - Residual is the difference between prediction and observation --- ## Mean as linear model - The simplest linear model is **the mean** - `\(y_i = b_0 + e_i\)` - `\(b_0 = Mean(y)\)` - That's *literally* the same as `\(y_i = b_0 + 0\times x_{1_i} + e_i\)` - Mean is the *intercept-only* model: a linear model where all `\(b\)` coefficients other than `\(b_0\)` have been set (fixed) to zero <br> <iframe id="r-sq-viz" class="viz app" src="https://and.netlify.app/viz/ols_reg?squares=false" data-external="1" width="100%" height="350"></iframe> --- ## Other coefficients? - Just like we can fix `\(b_1\)` to zero in `\(y_i = b_0 + 0\times x_{1_i} + e_i\)`, we can fix any other `\(b\)` coefficient as well - We can think of the basic single-predictor linear model as `$$y_i = b_0 + b_1 \times x_{1_i} + 0 \times x_{2_i} + 0 \times x_{3_i} + \dots + 0 \times x_{n_i} + e_i$$` - We're just ignoring all but one of the infinity possible predictors we could put in the model - Not including a predictor in a model is **the same as saying that there is no relationship between that variable and the outcome** - It's just said *implicitly* rather than aloud - We can include them in the model if we wish to so that their associated `\(b\)` coefficient gets estimated, rather than set to 0 --- ## Variables are dimensions - We've been representing the mean as a line on a plot of 2 variables - It can also be represented as a point on the number line - Every predictor *adds a dimension* <br> <iframe id="mult-pred-viz" class="viz app" src="https://and.netlify.app/viz/mult_pred" data-external="1" width="100%" height="500"></iframe> --- ## More complex models - Including more than one predictor allows us to model the outcome variable in a more sophisticated way - Every slope ( `\(b_n\)` coefficient, for `\(n>0\)`) expresses the relationship between a given predictor and the outcome *after the relationship of all other predictors has been accounted for* - A relationship – causal or not – between two variables can drastically change when another variable is taken into account - It's important to consider all variables with a known effect when modelling a relationship (especially in observational research) - Say we find a relationship between home environment and mental health - However, mental health has a strong genetic component - Parental predisposition to worse mental health is also linked to home environment - Can we *really* claim a relationship between environment and mental health if we don't consider genetics? --- ## Breast is best but is it smartest? - Lot of ink has been spilled over the claim that breastfeeding leads to increase in child IQ ([BBC](https://www.bbc.co.uk/news/health-31925449), [The Guardian](https://www.theguardian.com/science/brain-flapping/2015/mar/18/breastfeeding-raises-iq-worrying-questions), [The New York Times](https://www.nytimes.com/2018/05/09/well/family/breast-feeding-has-no-impact-on-iq-by-age-16.html), [FiveThirtyEigth](https://fivethirtyeight.com/features/everybody-calm-down-about-breastfeeding/)) - When assessed at face value breastfed children have higher IQ - Whether or not a person breastfeeds their child is also linked to things like socio‑economic status or the person's IQ - When these effects are adjusted for, the effect shrinks substantially – 3 IQ points difference is a [generous estimate](https://onlinelibrary.wiley.com/doi/full/10.1111/apa.13139) and even that has been [contested](https://www.pure.ed.ac.uk/ws/portalfiles/portal/30394557/Ritchie_Breastfeeding_letter_21_07_2016.pdf) <br> **The linear model allows us to build these more nuanced models and get closer to the Truth about the Universe<sup>TM</sup>** --- exclude: ![:live] .pollEv[ <iframe src="https://embed.polleverywhere.com/discourses/IXL29dEI4tgTYG0zH1Sef?controls=none&short_poll=true" width="800px" height="600px"></iframe> ] --- ## Mutiple predictors in practice - Today's example focuses on data about babies' birth weights and parental characteristics ([source](https://www.sheffield.ac.uk/mash/statistics/datasets)) <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["ID"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Length"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["Birthweight"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["Headcirc"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Gestation"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["smoker"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["mage"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["mnocig"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["mheight"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["mppwt"],"name":[10],"type":["dbl"],"align":["right"]},{"label":["fage"],"name":[11],"type":["dbl"],"align":["right"]},{"label":["fedyrs"],"name":[12],"type":["dbl"],"align":["right"]},{"label":["fnocig"],"name":[13],"type":["dbl"],"align":["right"]},{"label":["fheight"],"name":[14],"type":["dbl"],"align":["right"]},{"label":["lowbwt"],"name":[15],"type":["dbl"],"align":["right"]},{"label":["mage35"],"name":[16],"type":["dbl"],"align":["right"]}],"data":[{"1":"1360","2":"56","3":"4.55","4":"34","5":"44","6":"0","7":"20","8":"0","9":"162","10":"57","11":"23","12":"10","13":"35","14":"179","15":"0","16":"0"},{"1":"1016","2":"53","3":"4.32","4":"36","5":"40","6":"0","7":"19","8":"0","9":"171","10":"62","11":"19","12":"12","13":"0","14":"183","15":"0","16":"0"},{"1":"462","2":"58","3":"4.10","4":"39","5":"41","6":"0","7":"35","8":"0","9":"172","10":"58","11":"31","12":"16","13":"25","14":"185","15":"0","16":"1"},{"1":"1187","2":"53","3":"4.07","4":"38","5":"44","6":"0","7":"20","8":"0","9":"174","10":"68","11":"26","12":"14","13":"25","14":"189","15":"0","16":"0"},{"1":"553","2":"54","3":"3.94","4":"37","5":"42","6":"0","7":"24","8":"0","9":"175","10":"66","11":"30","12":"12","13":"0","14":"184","15":"0","16":"0"},{"1":"1636","2":"51","3":"3.93","4":"38","5":"38","6":"0","7":"29","8":"0","9":"165","10":"61","11":"31","12":"16","13":"0","14":"180","15":"0","16":"0"},{"1":"820","2":"52","3":"3.77","4":"34","5":"40","6":"0","7":"24","8":"0","9":"157","10":"50","11":"31","12":"16","13":"0","14":"173","15":"0","16":"0"},{"1":"1191","2":"53","3":"3.65","4":"33","5":"42","6":"0","7":"21","8":"0","9":"165","10":"61","11":"21","12":"10","13":"25","14":"185","15":"0","16":"0"},{"1":"1081","2":"54","3":"3.63","4":"38","5":"38","6":"0","7":"18","8":"0","9":"172","10":"50","11":"20","12":"12","13":"7","14":"172","15":"0","16":"0"},{"1":"822","2":"50","3":"3.42","4":"35","5":"38","6":"0","7":"20","8":"0","9":"157","10":"48","11":"22","12":"14","13":"0","14":"179","15":"0","16":"0"},{"1":"1683","2":"53","3":"3.35","4":"33","5":"41","6":"0","7":"27","8":"0","9":"164","10":"62","11":"37","12":"14","13":"0","14":"170","15":"0","16":"0"},{"1":"1088","2":"51","3":"3.27","4":"36","5":"40","6":"0","7":"24","8":"0","9":"168","10":"53","11":"29","12":"16","13":"0","14":"181","15":"0","16":"0"},{"1":"1107","2":"52","3":"3.23","4":"36","5":"38","6":"0","7":"31","8":"0","9":"164","10":"57","11":"35","12":"16","13":"0","14":"183","15":"0","16":"0"},{"1":"755","2":"53","3":"3.20","4":"33","5":"41","6":"0","7":"21","8":"0","9":"155","10":"55","11":"25","12":"14","13":"25","14":"183","15":"0","16":"0"},{"1":"1058","2":"53","3":"3.15","4":"34","5":"40","6":"0","7":"29","8":"0","9":"167","10":"60","11":"30","12":"16","13":"25","14":"182","15":"0","16":"0"},{"1":"321","2":"48","3":"3.11","4":"33","5":"37","6":"0","7":"28","8":"0","9":"158","10":"54","11":"39","12":"10","13":"0","14":"171","15":"0","16":"0"},{"1":"697","2":"48","3":"3.03","4":"35","5":"39","6":"0","7":"27","8":"0","9":"162","10":"62","11":"27","12":"14","13":"0","14":"178","15":"0","16":"0"},{"1":"808","2":"48","3":"2.92","4":"33","5":"34","6":"0","7":"26","8":"0","9":"167","10":"64","11":"25","12":"12","13":"25","14":"175","15":"0","16":"0"},{"1":"1600","2":"53","3":"2.90","4":"34","5":"39","6":"0","7":"19","8":"0","9":"165","10":"57","11":"23","12":"14","13":"2","14":"193","15":"0","16":"0"},{"1":"1313","2":"43","3":"2.65","4":"32","5":"33","6":"0","7":"24","8":"0","9":"149","10":"45","11":"26","12":"16","13":"0","14":"169","15":"1","16":"0"},{"1":"792","2":"53","3":"3.64","4":"38","5":"40","6":"1","7":"20","8":"2","9":"170","10":"59","11":"24","12":"12","13":"12","14":"185","15":"0","16":"0"},{"1":"1388","2":"51","3":"3.14","4":"33","5":"41","6":"1","7":"22","8":"7","9":"160","10":"53","11":"24","12":"16","13":"12","14":"176","15":"0","16":"0"},{"1":"575","2":"50","3":"2.78","4":"30","5":"37","6":"1","7":"19","8":"7","9":"165","10":"60","11":"20","12":"14","13":"0","14":"183","15":"0","16":"0"},{"1":"569","2":"50","3":"2.51","4":"35","5":"39","6":"1","7":"22","8":"7","9":"159","10":"52","11":"23","12":"14","13":"25","14":"200","15":"1","16":"0"},{"1":"1363","2":"48","3":"2.37","4":"30","5":"37","6":"1","7":"20","8":"7","9":"163","10":"47","11":"20","12":"10","13":"35","14":"185","15":"1","16":"0"},{"1":"300","2":"46","3":"2.05","4":"32","5":"35","6":"1","7":"41","8":"7","9":"166","10":"57","11":"37","12":"14","13":"25","14":"173","15":"1","16":"1"},{"1":"431","2":"48","3":"1.92","4":"30","5":"33","6":"1","7":"20","8":"7","9":"161","10":"50","11":"20","12":"10","13":"35","14":"180","15":"1","16":"0"},{"1":"1764","2":"58","3":"4.57","4":"39","5":"41","6":"1","7":"32","8":"12","9":"173","10":"70","11":"38","12":"14","13":"25","14":"180","15":"0","16":"0"},{"1":"532","2":"53","3":"3.59","4":"34","5":"40","6":"1","7":"31","8":"12","9":"163","10":"49","11":"41","12":"12","13":"50","14":"191","15":"0","16":"0"},{"1":"752","2":"49","3":"3.32","4":"36","5":"40","6":"1","7":"27","8":"12","9":"152","10":"48","11":"37","12":"12","13":"25","14":"170","15":"0","16":"0"},{"1":"1023","2":"52","3":"3.00","4":"35","5":"38","6":"1","7":"30","8":"12","9":"165","10":"64","11":"38","12":"14","13":"50","14":"180","15":"0","16":"0"},{"1":"57","2":"51","3":"3.32","4":"38","5":"39","6":"1","7":"23","8":"17","9":"157","10":"48","11":"32","12":"12","13":"25","14":"169","15":"0","16":"0"},{"1":"1522","2":"50","3":"2.74","4":"33","5":"39","6":"1","7":"21","8":"17","9":"156","10":"53","11":"24","12":"12","13":"7","14":"179","15":"0","16":"0"},{"1":"223","2":"50","3":"3.87","4":"33","5":"45","6":"1","7":"28","8":"25","9":"163","10":"54","11":"30","12":"16","13":"0","14":"183","15":"0","16":"0"},{"1":"272","2":"52","3":"3.86","4":"36","5":"39","6":"1","7":"30","8":"25","9":"170","10":"78","11":"40","12":"16","13":"50","14":"178","15":"0","16":"0"},{"1":"27","2":"53","3":"3.55","4":"37","5":"41","6":"1","7":"37","8":"25","9":"161","10":"66","11":"46","12":"16","13":"0","14":"175","15":"0","16":"1"},{"1":"365","2":"52","3":"3.53","4":"37","5":"40","6":"1","7":"26","8":"25","9":"170","10":"62","11":"30","12":"10","13":"25","14":"181","15":"0","16":"0"},{"1":"619","2":"52","3":"3.41","4":"33","5":"39","6":"1","7":"23","8":"25","9":"181","10":"69","11":"23","12":"16","13":"2","14":"181","15":"0","16":"0"},{"1":"1369","2":"49","3":"3.18","4":"34","5":"38","6":"1","7":"31","8":"25","9":"162","10":"57","11":"32","12":"16","13":"50","14":"194","15":"0","16":"0"},{"1":"1262","2":"53","3":"3.19","4":"34","5":"41","6":"1","7":"27","8":"35","9":"163","10":"51","11":"31","12":"16","13":"25","14":"185","15":"0","16":"0"},{"1":"516","2":"47","3":"2.66","4":"33","5":"35","6":"1","7":"20","8":"35","9":"170","10":"57","11":"23","12":"12","13":"50","14":"186","15":"1","16":"0"},{"1":"1272","2":"53","3":"2.75","4":"32","5":"40","6":"1","7":"37","8":"50","9":"168","10":"61","11":"31","12":"16","13":"0","14":"173","15":"0","16":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Birth weight, mother's age, and gestation time .codePanel[ ```r p1 <- bweight %>% ggplot(aes(mage, Birthweight)) + geom_point(size = 3, alpha = .4) + geom_smooth(method = "lm", color = theme_col, fill = second_col) + labs(x = "Mother's age at birth", y = "Birth weight (lbs)") p2 <- bweight %>% ggplot(aes(Gestation, Birthweight)) + geom_point(size = 3, alpha = .4) + geom_smooth(method = "lm", color = second_col, fill = theme_col) + labs(x = "Gestation duration (weeks)", y = "") cowplot::plot_grid(p1, p2) ``` ![](index_files/figure-html/unnamed-chunk-3-1.png)<!-- -->] --- ## Fit model using `lm()` ```r ## Intercept-only model m_null <- lm(Birthweight ~ 1, bweight) ## Add mother's age as predictor m_age <- lm(Birthweight ~ mage, bweight) # alternatively update(m_null, ~ . + mage) ## Add gestation duration as predictor m_gest <- lm(Birthweight ~ mage + Gestation, bweight) # same as update(m_age, ~ . + Gestation) ``` --- ## Results - null model ```r summary(m_null) ``` ``` ## ## Call: ## lm(formula = Birthweight ~ 1, data = bweight) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.39286 -0.37286 -0.01786 0.33464 1.25714 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.31286 0.09318 35.55 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6039 on 41 degrees of freedom ``` --- ## Results - Mother's age as predictor ```r summary(m_age) ``` ``` ## ## Call: ## lm(formula = Birthweight ~ mage, data = bweight) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.39275 -0.37288 -0.01786 0.33473 1.25702 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.312e+00 4.407e-01 7.516 3.62e-09 *** ## mage 1.845e-05 1.685e-02 0.001 0.999 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6114 on 40 degrees of freedom ## Multiple R-squared: 2.996e-08, Adjusted R-squared: -0.025 ## F-statistic: 1.199e-06 on 1 and 40 DF, p-value: 0.9991 ``` --- ## Results - M's age and gestation time ```r summary(m_gest) ``` ``` ## ## Call: ## lm(formula = Birthweight ~ mage + Gestation, data = bweight) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.77485 -0.35861 -0.00236 0.26948 0.96943 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.0092887 1.0567990 -2.848 0.00699 ** ## mage -0.0007953 0.0120469 -0.066 0.94770 ## Gestation 0.1618369 0.0258242 6.267 2.21e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4371 on 39 degrees of freedom ## Multiple R-squared: 0.5017, Adjusted R-squared: 0.4762 ## F-statistic: 19.64 on 2 and 39 DF, p-value: 1.26e-06 ``` --- ## Model prediction - Linear model can tell us the expected value of outcome for any combination of predictor values - According to our model, expected birth weight for a baby whose mother is 29 years old and whose gestation period was 38 weeks is: `$$\begin{aligned}\hat{y}&=-3.01 + 0 \times \text{age} + 0.16 \times \text{gestation}\\&=-3.01 + 0 \times 29 + 0.16 \times 38\\&=-3.01 + 0 + 6.08\\&=3.07\end{aligned}$$` - Let's compare to observations in sample ```r bweight %>% filter(mage == 29 & Gestation == 38) %>% rmarkdown::paged_table() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["ID"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Length"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["Birthweight"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["Headcirc"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Gestation"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["smoker"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["mage"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["mnocig"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["mheight"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["mppwt"],"name":[10],"type":["dbl"],"align":["right"]},{"label":["fage"],"name":[11],"type":["dbl"],"align":["right"]},{"label":["fedyrs"],"name":[12],"type":["dbl"],"align":["right"]},{"label":["fnocig"],"name":[13],"type":["dbl"],"align":["right"]},{"label":["fheight"],"name":[14],"type":["dbl"],"align":["right"]},{"label":["lowbwt"],"name":[15],"type":["dbl"],"align":["right"]},{"label":["mage35"],"name":[16],"type":["dbl"],"align":["right"]}],"data":[{"1":"1636","2":"51","3":"3.93","4":"38","5":"38","6":"0","7":"29","8":"0","9":"165","10":"61","11":"31","12":"16","13":"0","14":"180","15":"0","16":"0"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Negative intercept?! - The intercept always tells us the value of the outcome when all predictors are 0 - Not always sensible (instantaneous childbirth in women aged 0 is not a common occurrence) .codePanel[ ```r bweight %>% ggplot(aes(Gestation, Birthweight)) + geom_point(size = 3, alpha = .4) + geom_vline(xintercept = 0, lty=2) + geom_hline(yintercept = coefs[1], lty=2) + geom_point(data = tibble(x = 0, y = coefs[1]), mapping = aes(x, y), colour = "#fdfdfd", size = 4) + geom_abline(intercept = coefs[1], slope = coefs[3], color = second_col, size = 1) + geom_point(data = tibble(x = 0, y = coefs[1]), mapping = aes(x, y), pch=21, size = 4) + xlim(c(0, 45)) + ylim(c(-4, 5)) + labs(x = "Gestation duration (weeks)", y = "Birth weight (lbs)") ``` ![](index_files/figure-html/unnamed-chunk-10-1.png)<!-- -->] --- exclude: ![:live] .pollEv[ <iframe src="https://embed.polleverywhere.com/discourses/IXL29dEI4tgTYG0zH1Sef?controls=none&short_poll=true" width="800px" height="600px"></iframe> ] --- ## Transforming variables in the model - We can apply various transformations to variables in the model - Centring, scaling, standardising - Non-linear transformations are also possible (<i>e.g.,</i> log-transform) - Transforming variables **changes the interpretation of the coefficients** --- ## Centring - Centring *predictors* changes the interpretation of the intercept .smol[ ```r # untransformed predictor lm(Birthweight ~ Gestation, bweight) ``` ``` ## ## Call: ## lm(formula = Birthweight ~ Gestation, data = bweight) ## ## Coefficients: ## (Intercept) Gestation ## -3.0289 0.1618 ``` ```r # centred predictor bweight <- bweight %>% mutate(gest_cntrd = Gestation - mean(Gestation, na.rm=TRUE)) lm(Birthweight ~ gest_cntrd, bweight) ``` ``` ## ## Call: ## lm(formula = Birthweight ~ gest_cntrd, data = bweight) ## ## Coefficients: ## (Intercept) gest_cntrd ## 3.3129 0.1618 ``` ] --- ## Centring - What's the weight of a baby born to a "typical" mother in terms of age and pregnancy duration .smol[ ```r # centre mother's age bweight <- bweight %>% mutate(age_cntrd = mage - mean(mage, na.rm=TRUE)) lm(Birthweight ~ age_cntrd + gest_cntrd, bweight) %>% summary() ``` ``` ## ## Call: ## lm(formula = Birthweight ~ age_cntrd + gest_cntrd, data = bweight) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.77485 -0.35861 -0.00236 0.26948 0.96943 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.3128571 0.0674405 49.123 < 2e-16 *** ## age_cntrd -0.0007953 0.0120469 -0.066 0.948 ## gest_cntrd 0.1618369 0.0258242 6.267 2.21e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4371 on 39 degrees of freedom ## Multiple R-squared: 0.5017, Adjusted R-squared: 0.4762 ## F-statistic: 19.64 on 2 and 39 DF, p-value: 1.26e-06 ``` ] --- ## Scaling - Scaling *predictors* or *outcome* changes the interpretation of the slopes .smol[ ```r # untransformed outcome lm(Birthweight ~ gest_cntrd, bweight) ``` ``` ## ## Call: ## lm(formula = Birthweight ~ gest_cntrd, data = bweight) ## ## Coefficients: ## (Intercept) gest_cntrd ## 3.3129 0.1618 ``` ```r # scaled outcome bweight <- bweight %>% mutate(bweight_g = Birthweight / 2.205 * 1000) # 2.205 lbs in kg lm(bweight_g ~ gest_cntrd, bweight) ``` ``` ## ## Call: ## lm(formula = bweight_g ~ gest_cntrd, data = bweight) ## ## Coefficients: ## (Intercept) gest_cntrd ## 1502.43 73.39 ``` ] --- ## Standardising - Sometimes it's useful to talk about change in outcome associated with a 1 *SD* change in predictors .smol[ ```r # untransformed predictor lm(Birthweight ~ Gestation, bweight) ``` ``` ## ## Call: ## lm(formula = Birthweight ~ Gestation, data = bweight) ## ## Coefficients: ## (Intercept) Gestation ## -3.0289 0.1618 ``` ```r # standardised predictor bweight <- bweight %>% mutate(gest_z = scale(Gestation)) lm(bweight_g ~ gest_z, bweight) ``` ``` ## ## Call: ## lm(formula = bweight_g ~ gest_z, data = bweight) ## ## Coefficients: ## (Intercept) gest_z ## 1502 194 ``` ] --- ## It's all the same model! .codePanel[ ```r p1 <- bweight %>% ggplot(aes(Gestation, Birthweight)) + geom_point(size = 3, alpha = .4) + geom_smooth(method = "lm", color = second_col, fill = theme_col) + labs(x = "Gestation time (weeks)", y = "Birth weight (lbs)") p2 <- bweight %>% ggplot(aes(gest_cntrd, Birthweight)) + geom_point(size = 3, alpha = .4) + geom_smooth(method = "lm", color = second_col, fill = theme_col) + labs(x = "Gestation time (weeks from mean)", y = "") p3 <- bweight %>% ggplot(aes(Gestation, bweight_g)) + geom_point(size = 3, alpha = .4) + geom_smooth(method = "lm", color = second_col, fill = theme_col) + labs(x = "Gestation time (weeks)", y = "Birth weight (g)") p4 <- bweight %>% ggplot(aes(gest_z, bweight_g)) + geom_point(size = 3, alpha = .4) + geom_smooth(method = "lm", color = second_col, fill = theme_col) + labs(x = "Gestation time (z-score)", y = "") cowplot::plot_grid(p1, p2, p3, p4, nrow = 2) ``` ![](index_files/figure-html/unnamed-chunk-15-1.png)<!-- -->] --- ## Standardised coefficients - Standardised coefficients are equivalent to `\(b\)` coefficients in a model **where _both_ the predictors and the outcome have been *z*-transformed** - We'll call them `\(B\)` to distinguish them from "raw" coefficients `\(b\)` but there is a lot of [confusion in literature about the notation](https://www.theanalysisfactor.com/confusing-statistical-terms-1-alpha-and-beta/) (you may see `\(b\)`, `\(B\)`, `\(\beta\)`, or `\(Beta\)` used to mean either of the two) - `\(B\)` expresses the change in outcome in terms of number of *SD* as a result of 1 *SD* change in predictor --- ## Standardised coefficients - Handy function – `QuantPsyc::lm.beta()` - Only gives `\(B\)` for slopes, not intercept! .smol[ ```r m_gest <- lm(Birthweight ~ mage + Gestation, bweight) # raw coefficients (b) m_gest %>% coef() ``` ``` ## (Intercept) mage Gestation ## -3.0092887340 -0.0007952874 0.1618368592 ``` ```r # standardised coefficeints (B) m_gest %>% QuantPsyc::lm.beta() ``` ``` ## mage Gestation ## -0.007462176 0.708383324 ``` ```r # same as if we z-transform everything ourselves lm(scale(Birthweight) ~ scale(mage) + scale(Gestation), bweight) %>% coef() %>% round(9) ``` ``` ## (Intercept) scale(mage) scale(Gestation) ## 0.000000000 -0.007462176 0.708383324 ``` ] --- exclude: ![:live] .pollEv[ <iframe src="https://embed.polleverywhere.com/discourses/IXL29dEI4tgTYG0zH1Sef?controls=none&short_poll=true" width="800px" height="600px"></iframe> ] --- ## Take-home message - Linear model can be easily extended to more than one predictor - Each predictor entered into the model *adds an extra dimension* to the space in which the model exists - Each `\(b\)` coefficient (except for `\(b_0\)`) is a slope of the regression plane in its dimension - Both *including* and *omitting* a variable is a claim about its relationship with the outcome - A `\(b\)` coefficient for a predictor tells us about the relationship between the predictor and the outcome **after accounting for** the relationship between all other predictors and the outcome - Intercept may not be a sensible value if variables are not transformed - Transforming variables *changes the interpretation* of the coefficients - Standardised coefficients, `\(B\)`, express the change in outcome in terms of number of *SD* as a result of 1 *SD* change in predictor --- ## Next time (In a month, after teaching break!) - Evaluating multiple-predictor models with *multiple* <i>R</i><sup>2</sup> and *adjusted* <i>R</i><sup>2</sup> - Comparing models using the *F*-test - Categorical predictors redux --- ## Tutorial - Elaborates on the topics covered in lecture - You'll practice fitting and interpreting linear models with multiple predictors - Will come out sometime in the next week or two --- exclude: ![:live] class: center starwars last-slide .fade[ ] .crawl[ ## To be continued... .large[Have a lovely spring break! 🙂] ] <!-- class: last-slide weekend --> <!-- background-image: url("assets/end.jpg") --> <!-- background-size: cover --> <!-- # Have a lovely weekend :) -->