Practical 02
This worksheet builds on the guided part of the practical that preceded it. You can download the R
script from this live-coding session.
This worksheet presents you with the opportunity to practice what you learnt in the first part of the practical. To make things interesting, let’s use data from an actual paper by Swiderska and Küster (2018) exploring the ways in which the capacity of people to empathise with robots can be increased. You can find the data from the study at https://and.netlify.app/docs/harm_data.csv (you don’t need to download this file).
First of all, if you haven’t done it yet, create a week_02 R
project inside of your module folder and within it, create the standard folder structure, just like last week.
Download this R Markdown file and save/move it into your r_docs folder.
Use the R Markdown file you downloaded in task 2 to complete the following tasks.
In the setup
code chunk of the .Rmd file, write the code to load the packages you will need to complete this practical: tidyverse
and kableExtra
, and cowplot
should be enough. You might need to install the latter if you haven’t used it yet.
In the read-in
code chunk, write the code to read the data into RStudio.
All you need to do is copy the URL (address) of the file as a "character string"
into the readr::read_csv()
function and assign its output to an object.
# write a line of code to read in the data
data <- read_csv("https://and.netlify.app/docs/harm_data.csv")
In the inspect
chunk, write code to complete the following tasks:
Ask R
to give you the number of columns, the number of rows, and the column names of the dataset.
The names()
, ncol()
, and nrow()
functions are useful here.
OK, that’s quite a lot of columns. Let’s only keep a few to make things easier.
Discard all columns except for "ID"
, "Condition"
, "Humanness"
, "Harm"
, "Gender"
, and "Age"
.
You want to be selecting columns.
# only keep the ID, Condition, Humanness, Harm, Gender, and Age variables in the dataset
data <- data %>%
dplyr::select(ID, Condition, Humanness, Harm, Gender, Age)
Add code that gives you the age range (minimum and maximum) of participants in the data set.
There are oh-so-many ways to do this but one you’re already familiar with involves summarising the data.
Your code should make sure that you’re not analysing data from minors. But before we remove any participant from our data, it is crucially important to keep a record of how many we excluded.
In the clean
chunk, create an object remove_age
and in it, store the number of participants who are younger than 16 years old.
Did someone say filter?
Add a line in the clean
code chunk that only keeps data from participants who are 16+.
# only keep participants over 16 in your data
data <- data %>%
dplyr::filter(Age >= 16)
In the descriptives
code chunk, write code that creates:
A tibble of descriptive statistics (mean, standard deviation, minimum, maximum) for the Age
variable. It should look like this:
# A tibble: 1 x 4
mean sd min max
<dbl> <dbl> <dbl> <dbl>
1 22.3 6.20 18 44
We learnt how to summarise data in practical 9 last term
A tibble with Ns, %s, and age (mean and SD) breakdown by categories of the Gender
variable. Something like this:
# A tibble: 2 x 5
Gender n perc age_mean age_sd
* <chr> <int> <dbl> <dbl> <dbl>
1 Female 125 61.9 21.8 5.43
2 Male 77 38.1 23.1 7.24
PAAS practical 10 will come in handy here too.
When it comes to the perc
column, it requires a little bit of thinking. Think about how you can use the n
column and the number of rows in the data to derive the percentage. If you get stuck, check out the solution to last term’s practical 10
Now that we’ve inspected, cleaned, and summarised our data, let’s present it to the revered reader!
Edit the code in the table_1
code chunk at the bottom of the documet, giving it your tibble with age breakdown by gender, to create a nice formatted table in your document. Make sure the table show values to 2 decimal digits.
Gender | N | % | Mage | SDAge |
---|---|---|---|---|
Female | 125 | 61.88 | 21.82 | 5.43 |
Male | 77 | 38.12 | 23.13 | 7.24 |
# provide tibble to push to kable() and fill in missing column names
gender_desc %>%
kableExtra::kbl(col.names = c("Gender", "*N*", "%", "*M*~age~", "*SD*~Age~"),
caption = "*Descriptive statistics by categories of gender*",
digits = 2) %>%
kableExtra::kable_styling()
Complete the code for the histogram of age and bar chart of gender in the prepare-plots
chunk. They should look like this (feel free to choose the colours you like):
# you can store plots iside objects too!
age_hist <- data %>%
ggplot2::ggplot(aes(x = Age)) +
geom_histogram(fill = "white", colour = "black") +
labs(x = "Participants' age in years", y = "N") +
cowplot::theme_cowplot()
gender_bar <- data %>%
ggplot2::ggplot(aes(x = Gender)) +
geom_bar(fill = "seagreen") +
labs(x = "Participants' gender", y = "N") +
cowplot::theme_cowplot()
Since the plots are assigned to objects using the <-
operator, they will not be shown when you run the code in the chunk. To see what they look like you need to either only run the part of the code to the right of the <-
or run the chunk and then type the name of the object into the console.
Complete the code for the age_by_condition_gender
plot so that it looks like this:
Notice how the facet_wrap()
function is used to create two plots faceted by the Humanness
variable.
Complete the Write-up section of the .Rmd file.
The study was conducted on a sample of 202 volunteers (Mage = 22.3168317, SDage = 6.1973043). The data were collected anonymously on-line. Data from 15 participants were excluded due to unlikely values of age. Table 1 shows the distribution of gender as well as an age brake-up by individual gender categories.
[…]
Participants were presented with pictures of the avatars. Their task was to evaluate the degree to which mental capacities (experience, agency, consciousness, and pain) could be attributed to the faces and the extent to which the presented avatars elicited empathic reactions. Every page of the survey consisted of the respective face displayed above a 7-point, Likert-type response scale (1 = “Strongly disagree” to 7 = “Strongly agree”). The survey was delivered via EFS Survey (Version 9.0, QuestBack AG, Germany). The experiment followed a 2 (Harm: harmed vs. control) 2×2 (Robotization: human vs. robotic) between-subjects factorial design.
We lifted the above from the original paper. You should not!
In the print-plots
chunk, put the names of the object that contains the age_by_condition_gender
plot so that it gets printed out in the section of the document corresponding to where the chunk is. Don’t forget the figure caption!
age_by_condition_gender
Knit (generate) the document from your R markdown file and rejoice in its beauty.
That’s all for this week. You’ve done quite a lot today. You learnt about why it’s important to audit your data. You practised creating pipelines, grouping, and summarising data. You looked at the break-down of data by levels of a variable, created tables of basic descriptive statistics, and visualised the relationships between variables with some very pretty figures. Finally, you learned how to write up the Participants and Procedure sections of a paper.
Well done!