Correlation
...with solutions

Practical 05

Published

April 12, 2021

DOI

Teamwork!

Today’s practical will be a collaborative effort. Before you jump in, decide on roles with your team.

Task 1punk!

Decide who within your team will do the following roles. You should have only one scribe, but you can have more than one of the other roles.

Keep in mind that if someone in your team is usually the scribe, you should switch roles so that everyone gets practice working in RStudio.

At this point, the Scribe should share their screen, and your team should work through the following tasks together.

Setup

Task 2

Just like every week, we want to set up our workspace: project, folder, and document to work in.

Task 2.1punk!

CIf you haven’t done it yet, create a week_05 R project inside of your module folder and within it, create the standard folder structure, just like last week.

Task 2.2punk!

Create your own new Markdown file and save it in the week_05/r_docs folder you’ve just created. You can give it a title, put your name as the author, and delete any default text or code chunks that you won’t need. Keep the setup code chunk, though!

Use this R Markdown file to complete the following tasks, adding code chunks and headings as you go.

Task 2.3punk!

In the setup code chunk of the .Rmd file, write the code to load the tidyverse package.

Task 3

Download the Millennium Cohort dataset from the link below, and save it in a new object called data.

Link: https://and.netlify.app/docs/mc_data.csv

You should see a new object, data, appear in your environment.

Task 3.1punk!

Using R, find out how many participants there are in this dataset.

Wow, this is a huge dataset! We’ve said that a bigger sample is better, so this must be excellent.

Part 1: Correlation Analysis

Our data today comes from the Millennium Cohort, a large group of young people who have been invited to participate in a longitudinal study since birth. The data we are using is from a sweep that was conducted when the cohort was about 10 - 11 years old, and includes questions about both internet/social media usage and wellbeing/happiness.

Task 4Prog-rocK

Have a look at the Codebook below and choose two variables to correlate. Specifically, you should choose one variable about social media or Internet use, and one variable about happiness or wellbeing.

In the solutions, we’ll use often_messages and recently_laugh. If you want to, you can choose these as well; in that case, your code and output will match the solutions exactly.

However, I’d recommend you choose variables you find interesting. If you do choose different variables, though, keep in mind that your answers - including interpretation - will be different from the solutions.

Variable Name Description Scale or Values
sex Sex 1 = Male, 2 = Female
age Age as of last birthday Years (numeric)
Internet/Social Media Use
often_use_internet How often do you use the Internet, not at school? 1 = Most days, 2 = At least once a week, 3 = At least once a month, 4 = Less often than once a month, 5 = Never
often_messages How often do you exchange messages with friends on the Internet?
often_social_media How often do you visit a social networking website on the Internet?
Wellbeing/Happiness
happiness_looks How happy are you with the way you look? Likert scale, 1 = Completely happy, 7 = Not at all happy
happiness_life How happy are you with your life as a whole?
recently_happy In the last four weeks, how often did you feel happy? 1 = Never, 2 = Almost never, 3 = Sometimes, 4 = Often, 5 = Almost always
recently_worried In the last four weeks, how often did you worry about what would happen?
recently_sad In the last four weeks, how often did you feel sad?
recently_afraid In the last four weeks, how often did you feel afraid or scared?
recently_laugh In the last four weeks, how often did you laugh?
recently_angry In the last four weeks, how often did you get angry?

Task 5Prog-rocK

In your Markdown document, write down your prediction about how these two variables will be correlated. You should mention:

Hint

Ideally, your prediction should be based on an understanding of what previous papers have found - for instance, if you had a look at some of the recommended reading from the handout. If you didn’t, just use your own logic and reasoning to make a prediction.

When you’re predicting the direction of the correlation, make sure you read the “Scale or Values” column carefully to interpret the meaning of the numbers you get.

Don’t proceed until you have written down your predictions! It’s important to do this before you know the result of the analysis.

Task 6

Run and report the correlation analysis with the following steps.

Task 6.1Prog-rocK

Use the cor.test() function to output a correlation analysis on the two variables you chose.

Hint

If you need help, the Week 5 tutorial explains how to do this. Or, look up the help documentation by running ?cor.test in the console.

Task 6.2Prog-rocK

Report the results of your analysis. Your reporting should mention which variables you correlated, what their relationship was like (i.e. degree and direction), and give the following information about the analysis:

Hint

If you’re not sure exactly how to report this analysis, look back at the Week 4 tutorial and give it your best shot. You can also have a look at a paper reporting a correlation analysis!

Task 6.3jazz...

In addition to your statistical reporting, interpret the results. This means to explain what the correlation tells you in plain language.

Hint

This one’s tricky because of the way the scales are coded, i.e., what a higher score means for each variable. Make sure you check the Codebook carefully!

Task 7Prog-rocK

Compare the results you’ve reported to your original prediction. Was this what you expected? Was the correlation in the direction that you predicted? How about strength? Write down your thoughts in your Markdown document.

Task 7.1punk!

Once you’ve finished your write-up, go to the Padlet below for your practical session and paste your completed writeup there. Remember, you need to mention:

Practical 01 (Thursday 9am) | Practical 02 (Thursday 11am)

Practical 03 (Thursday 4pm) | Practical 04 (Thursday 6pm)

Practical 05 (Friday 9am)

Task 8Prog-rocK

Have a look through other responses on the Padlet, especially groups that used different variables that you did. Overall, did we find evidence that social media/internet use makes children more unhappy? Write down your thoughts in your Markdown.

Part Two: Let’s Think About This

So, job done, right? We found a significant correlation and proved definitively that sending online messages makes you laugh more (or whatever variables you used). Tell that to your great-aunt Marge the next time she tells you that you stare at your phone too much!

By now your statistics spidey senses are hopefully tingling. There are a few things here that aren’t quite right. Let’s look into this correlation a little further.

Task 9punk!

Can we conclude from this analysis that sending online messages makes you laugh more (or whatever variables you used)? Write your thoughts in your Markdown.

Hint

Say it with me…

Task 9.1Prog-rocK

Can we conclude that we have proven a relationship between sending online messages and laughing more often (or whatever variables you used)? Write your thoughts in your Markdown.

Even with these caveats in place, it’s still tempting to conclude that because our value of p is so small, there must be something cool or interesting happening here. But…is there? Let’s have a closer look at what this value of r actually means.

Interpreting the Correlation

Remember that the value of r is an interpretable number: it tells us about the strength of the relationship between our two variables. This makes r a very useful example of an effect size: a number whose magnitude corresponds to the size of the effect we are interested in. The bigger the (absolute) value of r, the stronger the relationship between the variables.

Task 10Prog-rocK

What do you understand the absolute value of your correlation coefficient r to mean? Write down your thoughts in your Markdown.

At this point it would be very useful to actually graph our data. As we did in the tutorial, this might help us understand what the correlation is actually telling us.

In a real analysis, it is essential to do lots of thorough data exploration and graphing before you go plunging into your analysis! We’re just coming round to it now for dramatic effect 😄

Task 11jazz...

Create a scatterplot of the two variables you chose for your analysis. Does this change your interpretation at all? Write down your thoughts in your Markdown.

Overall, it’s clear that the relationship between these variables is not as simple as the value of r would lead us to believe. This is why it’s always important to think about the actual value of the test statistic you have calculated, and to inspect your data thoroughly (for example, by making graphs) instead of only relying on the significance value.

 

Well done!

This is the end of the required portion of the practical, so you’re welcome to jump down to the end if time is up. If you have whizzed through the previous tasks and still have time left, or if you’re just curious, carry on to the next section to dig deeper into what’s going on.

This final section is optional, but recommended. If you chose different variables above, keep in mind that this section will focus on the often_messages and recently_laugh variables.

Optional: But It’s Significant!

So, how do we reconcile the fact that this correlation is significant, with the fact that the actual value of r is quite small and the scatterplot of the data shows no obvious trend?

The key comes from divorcing the real-world meaning of the word “significant” (i.e. “meaningful”, “important”) from the statistical sense.

Task 12

Let’s revise what we know about distributions, standard error, and sample size.

Task 12.1punk!

Recall that we checked the number of participants at the beginning. How many were there again?

Task 12.2Prog-rocK

In your own words, explain what the relationship is between sample size and standard error. Write your thoughts in your Markdown document.

Hint

See Lecture 1 for help!

Task 12.3Prog-rocK

Using this online calculator, set the value of r to our value (-.11). Then, try changing the value in “Sample Size”, leaving the value of r the same. What happens to the value of t as sample size increases? Write down your thoughts in your Markdown.

Task 12.4death metal

Putting it all together - as sample size increases, what happens to the value of t - and therefore of p? Why should this be the case? Write down your thoughts in your Markdown.

Hint

How does the t-distribution work, and what are its critical values?

If you got stuck here or are finding this quite difficult, don’t worry - these are complex ideas. As you think about them and use them more, they will become more familiar and more intuitive.

Recap

Well done today! We got some practice running a correlation analysis, reporting it in APA style, and examining what those results actually mean. If you did the optional extra tasks, you also looked at why very small effects are still (statistically) significant for very large samples.

You’re welcome - and encouraged! - to keep working with the Millennium Cohort data to practice what we’ve learned today.

Remember, if you get stuck or you have questions, post them on Piazza, or bring them to StatsChats or to drop-ins.

 

Good job!

 

Footnotes

  1. This is a type of response bias called acquiescence bias, in which respondents tend to use only the extreme ends of the scale, and is quite typical of child participants (here’s a paper if you’re interested).[↩]