Describing data

Practical 02

Published

April 12, 2021

DOI

Guided part

This worksheet builds on the guided part of the practical that preceded it. You can download the R script from this live-coding session.

This worksheet presents you with the opportunity to practice what you learnt in the first part of the practical. To make things interesting, let’s use data from an actual paper by Swiderska and Küster (2018) exploring the ways in which the capacity of people to empathise with robots can be increased. You can find the data from the study at https://and.netlify.app/docs/harm_data.csv (you don’t need to download this file).

Setup

Task 1punk!

First of all, if you haven’t done it yet, create a week_02 R project inside of your module folder and within it, create the standard folder structure, just like last week.

Task 2punk!

Download this R Markdown file and save/move it into your r_docs folder.

Use the R Markdown file you downloaded in task 2 to complete the following tasks.

Part 1: Inspecting and wrangling data

Task 3punk!

In the setup code chunk of the .Rmd file, write the code to load the packages you will need to complete this practical: tidyverse and kableExtra, and cowplot should be enough. You might need to install the latter if you haven’t used it yet.

Task 4punk!

In the read-in code chunk, write the code to read the data into RStudio.

Hint

All you need to do is copy the URL (address) of the file as a "character string" into the readr::read_csv() function and assign its output to an object.

Task 5

In the inspect chunk, write code to complete the following tasks:

Task 5.1punk!

Ask R to give you the number of columns, the number of rows, and the column names of the dataset.

Hint

The names(), ncol(), and nrow() functions are useful here.

 

OK, that’s quite a lot of columns. Let’s only keep a few to make things easier.

Task 5.2punk!

Discard all columns except for "ID", "Condition", "Humanness", "Harm", "Gender", and "Age".

Hint

You want to be selecting columns.

Task 5.3Prog-rocK

Add code that gives you the age range (minimum and maximum) of participants in the data set.

Hint

There are oh-so-many ways to do this but one you’re already familiar with involves summarising the data.

Task 6

Your code should make sure that you’re not analysing data from minors. But before we remove any participant from our data, it is crucially important to keep a record of how many we excluded.

Task 6.1Prog-rocK

In the clean chunk, create an object remove_age and in it, store the number of participants who are younger than 16 years old.

Hint

Did someone say filter?

Task 6.2punk!

Add a line in the clean code chunk that only keeps data from participants who are 16+.

Task 7

In the descriptives code chunk, write code that creates:

Task 7.1Prog-rocK

A tibble of descriptive statistics (mean, standard deviation, minimum, maximum) for the Age variable. It should look like this:

# A tibble: 1 x 4
   mean    sd   min   max
  <dbl> <dbl> <dbl> <dbl>
1  22.3  6.20    18    44
Hint

We learnt how to summarise data in practical 9 last term

Task 7.2jazz...

A tibble with Ns, %s, and age (mean and SD) breakdown by categories of the Gender variable. Something like this:

# A tibble: 2 x 5
  Gender     n  perc age_mean age_sd
* <chr>  <int> <dbl>    <dbl>  <dbl>
1 Female   125  61.9     21.8   5.43
2 Male      77  38.1     23.1   7.24
Hint

PAAS practical 10 will come in handy here too.

When it comes to the perc column, it requires a little bit of thinking. Think about how you can use the n column and the number of rows in the data to derive the percentage. If you get stuck, check out the solution to last term’s practical 10

Part 2: Tables and Visualisations

Now that we’ve inspected, cleaned, and summarised our data, let’s present it to the revered reader!

Task 8Prog-rocK

Edit the code in the table_1 code chunk at the bottom of the documet, giving it your tibble with age breakdown by gender, to create a nice formatted table in your document. Make sure the table show values to 2 decimal digits.

Table 1: Descriptive statistics by categories of gender
Gender N % Mage SDAge
Female 125 61.88 21.82 5.43
Male 77 38.12 23.13 7.24

Task 9Prog-rocK

Complete the code for the histogram of age and bar chart of gender in the prepare-plots chunk. They should look like this (feel free to choose the colours you like):

Task 10jazz...

Complete the code for the age_by_condition_gender plot so that it looks like this:

Notice how the facet_wrap() function is used to create two plots faceted by the Humanness variable.

Task 11Prog-rocK

Complete the Write-up section of the .Rmd file.

Task 12punk!

In the print-plots chunk, put the names of the object that contains the age_by_condition_gender plot so that it gets printed out in the section of the document corresponding to where the chunk is. Don’t forget the figure caption!

Task 13punk!

Knit (generate) the document from your R markdown file and rejoice in its beauty.

 

That’s all for this week. You’ve done quite a lot today. You learnt about why it’s important to audit your data. You practised creating pipelines, grouping, and summarising data. You looked at the break-down of data by levels of a variable, created tables of basic descriptive statistics, and visualised the relationships between variables with some very pretty figures. Finally, you learned how to write up the Participants and Procedure sections of a paper.

 

 

Well done!

 

 

Footnotes