class: center, middle, inverse, title-slide # Correlation ## Lecture 04 ### Dr Jennifer Mankin ### 19 February 2021 --- <script type="text/javascript"> <!-- message to display at the bottom of slides when ?live=true --> const slideMessage = "<span>Ask questions at </span><a href = 'pollev.com/milanvalasek890'>pollev.com/milanvalasek890</a>" live() </script> ## Welcome to the Fun Part! - Very well done for all your hard work so far! -- - The concepts we have covered are complex and difficult - Mastery takes **time** and **practice** - You have all made an excellent start!!!! -- - We will now begin putting these ideas into practice - Much less new information! - Applying the same ideas to different research questions/scenarios -- - Finally at the confluence of your stats knowledge and R skill - Let's get started! --- ## Looking Ahead - This week: Correlation -- - Week 5: Chi-Square -- - Week 6: *t*-test -- - Week 7: The Linear Model - Week 8: The Linear Model --- ## What About the Lab Report? - We will not start the lab report for a couple more weeks - Don't think about starting now - you can't! - We will talk about the lab report in the lectures **and** work on it in the practicals - Make sure you come to your registered sessions --- ## Objectives After this lecture you will understand: - The concepts behind statistical correlation - How to interpret the values of the correlation coefficient *r* - How to read a correlation matrix - How to interpret and report significance tests of *r* - The relationship between correlation and causation --- ## Distributions, Test Statistics, and NHST * Everything from the past few weeks we will now put into action! * For each statistical analysis, we will have the same ingredients: + **Data**, from which we calculate... + A **test statistic** that represents the relationship of interest, which we compare to... + The **distribution** of that test statistic under the null hypothesis to get... + The **probability** *p* of getting a test statistic as large as the one we have (or larger) if the null hypothesis is true --- ## Overall Reminder - We want to believe true things about the world, and disbelieve false things - More accurately: we should believe things that are well-founded in reliable evidence, and disbelieve things that are not - Statistics is a system to help us make decisions about whether, and to what degree, we believe something is supported by evidence --- ## Correlation - Essential question: how do two variables change in relation to each other? - When one variable changes, does the other... - Change in a similar way? - Change in the opposite way? - Not change very much at all? -- - In other words: to what degree do two variables **behave the same way**? --- ## Correlation - Quantifies the **degree** and **direction** of a relationship - Typically used with two (or more) continuous variables - Can be used when one is categorical! - Today's example: Gender and Sexuality data from the questionnaire --- ## Correlation: Visualisation ![Plot of ratings of femininity vs masculinity](index_files/figure-html/fem_masc_plot-1.png) --- ## Correlation - People who gave high ratings for femininity tended to give low ratings for masculinity, and vice versa - We might like to know: - How strong is this relationship? - Should we believe that it's real (ie representative of people/first-year psychology students in general?) --- ## Correlation: Interpretation - We can quantify the **strength** and **direction** of the relationship between femininity and masculinity with Pearson's correlation coefficient *r* - Values range from -1 (perfect negative) through 0 (no relationship) to 1 (perfect positive) -- ### Strength - Absolute value of *r* between 0 and 1 - 0: no relationship at all - 1: perfect relationship -- ### Direction - Whether the value of *r* is positive or negative - Positive: as one variable increases, the other tends to *increase* - Negative: as one variable increases, the other tends to *decrease* --- ## Correlation: Let's Try It! ```r gensex %>% select(Gender_fem_1, Gender_masc_1) %>% cor(method = "pearson") ``` ``` ## Gender_fem_1 Gender_masc_1 ## Gender_fem_1 1.0000000 -0.7563823 ## Gender_masc_1 -0.7563823 1.0000000 ``` - So, our correlation coefficient *r* is -.76 - POP QUIZ: How can we interpret this? --- ## Correlation: Interpretation - The negative sign (-) means as femininity increases, masculinity tends to **decrease** (and vice versa) - The absolute value of .76 is **very** strong - quite close to 1! ![Plot of ratings of femininity vs masculinity](index_files/figure-html/fem-masc-plot-again-1.png) --- ## Correlation: Significance - We now have our **data**, from which we calculated... - Our **test statistic** *r* (-.76) - We also know the **distribution** of *r* with different degrees of freedom - Or, rather...of *t*, for Reasons (TM) -- - We can now ask how likely we are to get a value of -.76 (or larger) if in fact femininity and masculinity have a true *r* of 0 - i.e. the null hypothesis is in fact true - We will use the standard significance level of .05 in this case --- ## Correlation: Significance ``` ## ## Pearson's product-moment correlation ## ## data: Gender_fem_1 and Gender_masc_1 ## t = -20.098, df = 304, p-value < 0.00000000000000022 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.7997536 -0.7027607 ## sample estimates: ## cor ## -0.7553645 ``` We can report this as: "There was a significant negative correlation between femininity and masculinity, *r*(304) = -.76, *p* < .001." --- exclude: ![:live] <iframe src="https://pollev-embeds.com/discourses/TBFC7yRhs3BnkC36q7g5R/respond" width="800px" height="600px" style="margin-top:20px;"></iframe> --- ## Correlation Matrices - Correlations are often presented in *matrices* - Each cell contains the correlation coefficient *r* for the variables in the corresponding row and column - POP QUIZ: Why is there a diagonal line of 1s? ``` ## comfortable masc fem stability ## comfortable 1.00 -0.31 0.17 0.61 ## masc -0.31 1.00 -0.76 -0.28 ## fem 0.17 -0.76 1.00 0.18 ## stability 0.61 -0.28 0.18 1.00 ``` --- ## Correlation Matrices - More useful version with `GGally::ggscatmat()` - Scatterplots, distributions, and *r* values ![](index_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- ## Correlation = Causation? - Our analysis showed that higher ratings of femininity tended to correspond to lower ratings of masculinity, and vice versa - Can we conclude from this that being more feminine **causes** you to be more masculine? -- <br><br><br> <center><strong>No, definitely not!!!</strong></center> --- ## Correlation ≠ Causation! - Why not? :( -- - No distinction between cause and effect - Which is the chicken and which is the egg? - Which came first: femininity or masculinity? -- - No experimental manipulation (randomisation) -- - The problem of *tertium quid* - a third variable that influences both the variables you're actually measuring --- ## Consider This... - Consider the number of hours per day you and a friend on this course spend studying - Both tend to study less on similar days (e.g. the weekend) - Both tend to study more on similar days (e.g. right before an assessment is due) -- - So, you and your friend's hours studying are likely to be highly correlated - Does this mean that you studying more (or less) **causes** your friend to study more (or less)? --- ## Consider This... - **Of course not**! Which of you "causes" the other to study more/less? - Tertium quid: An unmeasured *third factor* that influences both of you - In this case: being on the same course -- - Some sources of variation: - Differences in experience or interest - Which electives you're each taking - Friends and family obligations - Part time work --- ## More Examples <iframe src="https://www.tylervigen.com/spurious-correlations" width="800px" height="400px" style="margin-top:20px;"></iframe> --- ## Say It With Me <br><br><br><br> .center[.large[.large[CORRELATION DOES NOT<br>IMPLY CAUSATION]]] --- exclude: ![:live] <iframe src="https://pollev-embeds.com/discourses/TBFC7yRhs3BnkC36q7g5R/respond" width="800px" height="600px" style="margin-top:20px;"></iframe> --- ## Correlation: **VOCAB ALERT!** - In common language, "correlated" means "related to in some way, usually causally" - In statistics-ese, it means "the (standardised) degree to which two or more variables covary", ie change in relation to each other -- - "Correlation" is a technical term! - In your reports, do not say two things are "correlated" unless you report *r* as evidence! - Instead: variables "have a relationship"/"are related to each other" --- ## Consider This <center> <blockquote class="twitter-tweet"><p lang="en" dir="ltr">A new study shows a rise in depression and stress among young people parallels the growth in smartphone and social media use.<a href="https://t.co/AxyseUyBxn">https://t.co/AxyseUyBxn</a></p>— NPR (@NPR) <a href="https://twitter.com/NPR/status/1106264229210775552?ref_src=twsrc%5Etfw">March 14, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> </center> --- ## Correlation: Summary * The correlation coefficient *r* quantifies the strength and direction of relationships between variables * The *p*-value associated with *r* is the probability of encountering a value of *r* as large as the one we have, or larger, **if in fact the true value of *r* in the population is 0** * Correlation **DOES NOT IMPLY CAUSATION!!!!!!!** * More practice with interpreting *r* with [this fun little game](http://guessthecorrelation.com/) --- ## A Few Reminders - Recognise people who have helped your or others by [nominating them for a SavioR award](https://canvas.sussex.ac.uk/courses/12727/quizzes/19897) - Give us feedback, ideas, or suggestions in [the Suggestion Box](https://canvas.sussex.ac.uk/courses/12727/quizzes/20246) -- - Don't try to go it alone! - Ask to study with practical teams, friends on the course - Set up Zoom calls to work on the tutorials together - Be the change you wish to see in the world 😄 --- ## Looking Ahead ### For the Quiz - Revise all new definitions/concepts (see previous slide) - Revise how to read the output of `GGally::ggscatmat()` and `cor.test()` - More practice in the tutorial! - Do **NOT** need to memorise function names or syntax -- ### Next Time * Comparing frequencies with Chi-Square (*χ*<sup>2</sup>) --- class: last-slide <br><br><br><br><br> # Have a lovely weekend!