Chi-Square

Tutorial 6

Published

March 1, 2021

DOI

This tutorial covers how to prepare for, complete, and report chi-square tests in R. Before you start this tutorial, you should make sure you review the relevant lecture, as this tutorial assumes you already know what chi-square is and what it’s for.

Writing Lab Reports

If you are writing your lab report on the Green study (Griskevicious et al., 2010), follow the steps of this tutorial to analyse and report your results.

Setting Up

All you need is this tutorial and RStudio. Remember that you can easily switch between windows with the Alt + ↹ Tab (Windows) and ⌘ Command + ⇥ Tab (Mac OS) shortcuts.

Task 1

Create a new week_06 project and open it in RStudio. Then, create two new folders in the new week_6 folder: r_docs and data. Finally, open a new Markdown file and save it in the r_docs folder. Since we will be practicing reporting (writing up results), we will need Markdown later on. For the tasks, get into the habit of creating new code chunks as you go.

Remember, you can add new code chunks by:

  1. Using the RStudio toolbar: Click Code > Insert Chunk
  2. Using a keyboard shortcut: the default is Ctrl + Alt + I, but you can change this under Tools > Modify Keyboard Shortcuts…
  3. Typing it out: ```{r}, press Enter, then ``` again.
  4. Copy and pasting a code chunk you already have (but be careful of duplicated chunk names!)

 

Task 2punk!

Add and run the command to load the tidyverse package in the setup code chunk.

 

In Hot Water

A long, long time ago

1 , you may remember Milan talking about algorithmic thinking in terms of the steps of making tea. Sadly, in the intervening months R has not yet evolved the ability to make a cuppa. However, we’re going to stir up some controversy today by coming back round to the topic of tea.

You may remember this horrific abomination interesting interpretation of the tea-making process that blew up last summer. Many people, including my English partner

In L’s words, just now watching it again: “How did she get every single bit of it wrong?” And L doesn’t even drink tea!
2 , were thoroughly scandalised, but it raises an interesting question: does changing the order of the steps actually change the taste?

Here’s the video in question, if you would like a nice steaming cup of absolute outrage.

Americans making hot tea 🍵 #americanintheuk

♬ original sound - jchelle28

You may know someone - or be that someone yourself! - who is convinced that you can tell the difference when someone makes tea “wrong”. Well, whether there is a right or wrong way to make a cuppa is a matter of opinion

3 , but whether someone can accurately identify a difference in taste is a matter of science. So - let’s design an experiment and then analyse the data to find out!

A Lady Tasting Tea

This particular research question is very appropriate for our statistical test today. In 1935, botanist Dr Blanche Muriel Bristol claimed that she could tell whether the milk had been added before or after the tea. Her claim was apparently substantiated by her subsequent performance on eight randomly presented cups of tea, four with the milk first and four with the tea first. The person testing her on this claim was a dapper young Ronald Fisher, who subsequently described a version of this experiment in the 1935 paper “A Lady Tasting Tea

Arguably, this paper should have been called “A Doctor Tasting Tea”, but seeing as how people still struggle with giving women their proper titles almost a hundred years later, it’s not entirely surprising.
4 . This paper set out Fisher’s idea of the null hypothesis for the first time.

Data and Design

Our experiment today will be based on this excellent investigation by students at the University of Sheffield, which includes full descriptions of the results and instructions for doing your own tea taste test with your family, friends, or flatmates. In case you’re not up for making a few hundred cups of tea in the name of Science, we’ll use that study’s data instead, graciously provided by Dr Tom Stafford.

Task 3punk!

Read in the data hosted at the link below and save it in a new object. I’ve called mine chi_tea, but you do you.

Link: https://and.netlify.app/docs/tea_position.csv

The Tea Taste Test Team (henceforth: T3 Team) took quite a few measures for their experiment. For now, we’ll look at the main result of interest: whether people were able to correctly identify which was the milk-first and which was the tea-first cup.

The T3 Team’s paradigm consisted of a double-blind experiment. First, the experimenter asked their participants a few initial questions about how they typically liked their tea, and whether the participant believed they could tell the difference. The tea-maker prepared two cups of tea out of view of both the experimenter and the participant, one with the milk added first and the other with the tea added first. Then, the participant chose a ticket that randomly determined two things: whether the milk-first cup would be placed on their left or right, and whether they should first taste the cup on their left or right. Once they tasted both cups, the experimenter asked the participant which cup they preferred, and which cup they believed was made with the milk first. Importantly, the experimenter was also not aware of which was which; only once the participant made their judgement did the tea-maker reveal the truth.

Task 4

Have a poke around this dataset to get familiar with it.

Task 4.1punk!

How many participants were there in this experiment?

Task 4.2punk!

How many variables are there?

Task 4.3punk!

What are the variables called?

Test of Association

As we discussed in the lecture, a test of association (also called a test of independence) investigates whether two variables are associated with each other. In this case, we want to know whether there is an association between the location of the milk-first cup of tea (left or right), and the participant’s belief about the location of the milk-first cup of tea (left or right).

Task 5

Stop and think about this design before we go on.

Task 5.1Prog-rocK

What is the null hypothesis for this experiment? What pattern would we expect to see in the data if the null hypothesis were true? Write down your thoughts in your Markdown.

Task 5.2Prog-rocK

What is the alternative hypothesis for this experiment? What pattern would we expect to see in the data if the alternative hypothesis were true? Write down your thoughts in your Markdown.

Task 5.3Prog-rocK

What do you think we will find? Do you think that people will be able to correctly identify the milk-first tea? Do you think you could? Write down your thoughts in your Markdown.

Visualising the Data

Now that we have our predictions nailed down, let’s have a look at the data! As usual, it’s always important to visualise your data first thing.

Task 6

Let’s build a beautiful bar chart to represent the count data we have. We’ll start with a basic chart and build it up from there to create a publication-worthy, lab-report-quality graph. In the end, it will look like this:

Task 6.1punk!

Start by creating a basic bar chart of the actual_position variable like the one below, and adding a theme to spruce it up.

Hint

We haven’t done bar charts much before, but you do have some practice using ggplot2. Remember that we usually tell ggplot what kind of figure to make using geoms - see if you can guess what function you need!

If you get stuck, I recommend the R Graph Gallery for help and examples, or just Googling “How to make a bar chart in R”!

For a theme, there are a lot pre-loaded in R. Check out theme_minimal() and theme_bw for clean, simple themes. Naturally, you can also install extra themes. I like theme_cowplot(), but you’ll have to install the cowplot package to use it. Any of these are fine - choose one you like!

Task 6.2punk!

Let’s now add in our second believe_position variable to the bar chart.

Hint

We already told R what to put on the x-axis, and we already have what we want on the y-axis (i.e., counts). So, to add our second variable, we can use the fill = argument in aes() to split up our two actual_position bars by their value on believe_position.

Hm…that doesn’t look quite right. Here, the fill argument has filled the bars with different colours. This doesn’t make comparison very easy, so instead I want to split the two bars into four, based on their value of the second variable, believe_position. In other words, I want geom_bar to draw the bars in different positions.

Task 6.3punk!

Add position = "dodge" to the geom_bar function.

Hey, that’s looking better already! That’s basically the information that we want, and this would be enough if we were just using this graph to visualise the data for our own purposes. However, we have reports to write, so we’ll need to add some formatting to make it look profesh.

Task 6.4Prog-rocK

Change the labels using the following functions:

Hint

There are lots of ways to change the names of scales, including labs(). I like doing them separately with these scale_ functions because you can also use them to change other things, like what numbers appear on the y-axis, and what the labels are for categories. Ultimately you should build your plots in a way that makes sense to you, and that ends up looking the way you want it.

For example, in the solution I’ve also added two more arguments to scale_y_continuous: limits and breaks. The first changes the minimum and maximum values on the y axis, and the second changes how the numbers appear. This is optional, but you can try messing about with these two arguments to see what happens if you want to get a feel for how they work.

Use the help documentation if you’re not sure how to use these functions!

Now that we’ve got the text nailed down, let’s do something about those colours.

Task 6.5punk!

Add a type = c(... argument to scale_fill_discrete to change the colours. Remember you will need to give two colours, one for “left” and one for “right”.

Hint

To add colours, you can use either named colours that R knows, or hexidecimal colour codes.

R recognises a lot of colour names. Just put the name you want in “quotes” to use it. If you wanted the Analysing Data theme colours, you could use darkcyan and purple4.

If you want a very precise shade that isn’t on the list of coloured names, you can also use any colour that your heart desires using the colour’s hexidecimal code. The hex code is a series of 6 letters and numbers that uniquely specify any colour in the RGB colour gamut. You can try typing random ones, or use this handy colour selector to get the exact shade you want.

If you wanted the Analysing Data theme colours, you could use darkcyan or #52006F for teal, and #009Fa7 or purple4 for the purple.

Now, as stylish and attractive as that Analysing Data colour palette is, it isn’t really suitable for formal reporting. Let’s tone it down a bit.

Task 6.6punk!

Choose some more formal colours for the bars if you didn’t already.

Now that’s looking good!

Lab Report Plots

For the Green lab reports, you must include a bar chart of your results. You don’t have to only use grey, or any particular theme, but you should choose colour and styling that look like something you might see in a journal article.

Task 7

How can you interpret this plot? Write down your thoughts in your Markdown.

Now that we’ve had a look at our data, we can move on to our statistical testing!

 

Chai-Square 5

Running the Analysis

So, we’ve made our predictions, and we’ve had a look at the data. Now we’re ready to conduct our χ2 test.

Task 8

Conduct and report the χ2 test of association.

Task 8.1Prog-rocK

Use the chisq.test() function to perform a χ2 test of association on the chi_tea data. Your output should look like the one below.

Hint

All you need to put into the chisq.test() function are the two variables we are using. You should use subsetting with $ here; this function doesn’t play well with pipes!


    Pearson's Chi-squared test with Yates' continuity correction

data:  chi_tea$actual_position and chi_tea$believe_position
X-squared = 0.57655, df = 1, p-value = 0.4477

Pretty painless, eh?

Reporting the Results

Now that we’ve got our test result, let’s report it in APA style. This takes the general form:

(name_of_estimate(degrees_of_freedom) = value_of_estimate, p = exact_p)

We don’t have confidence intervals for this test, so we don’t need to report them.

Task 9Prog-rocK

Use the general format above and the values to type out the result of the χ2 test.

We should also describe in words what this result means. We essentially include the statistical result as a citation to give evidence for our claim.

Task 9.1Prog-rocK

Have a go writing out a report of this statistical analysis.

That’s looking pretty slick! This is exactly the sort of thing you would expect to see in a journal article - or a lab report 😉

You should never have any raw (unformatted) code or output in a report! Always report results in the text, in a figure, or in a formatted table.

Task 9.2punk!

Save your test output to a new object, chi_tea_test, to refer to it later.

 

Observed and Expected Frequencies

To finish off our reporting, it’s good practice to report the actual numbers or frequencies that went into the chi-square test. Luckily, we can get this very easily out of the chi_tea_test object we’ve just created.

Task 10

Report the numbers in each combination of conditions.

Task 10.1

Use str() to look at the structure of the chi_tea_test object.

We can see that there’s quite a bit stored in this object - more than appears when we just call its name. Among other useful info, we can see that there are $ observed and $ expected objects stored here.

Task 10.2punk!

Print out the table of observed frequencies.

Hey, that’s handy - here are the values we were looking for!

You might notice that these are the same counts that appeared in our graph up above. Even though this information is presented visually there, it’s still a good idea to include these numbers in your report.

Task 10.3Prog-rocK

Write up a short report of the observed frequencies.

Task 11

Put all of this together into one report of your findings.

Together with the graph we already made, this would make a great report!

Before we finish up, there’s one more thing we should check. Remember from the lecture that the χ2 test requires all expected frequencies to be greater than five.

Task 12Prog-rocK

Check the expected frequencies for your test. Is there a problem?

Recap

Well done conducting your χ2 analysis. I highly recommend reading all of the T3 Team’s findings - they asked a lot of interesting questions and have done a great job presenting their results.

In sum, we’ve covered:

Remember, if you get stuck or you have questions, post them on Piazza, or bring them to StatsChats or to drop-ins.

 

Good job!

That’s all for today. See you soon!

 

 

 

Footnotes

  1. In a galaxy far, far away…[↩]
  2. In L’s words, just now watching it again: “How did she get every single bit of it wrong?” And L doesn’t even drink tea![↩]
  3. Or is it…![↩]
  4. Arguably, this paper should have been called “A Doctor Tasting Tea”, but seeing as how people still struggle with giving women their proper titles almost a hundred years later, it’s not entirely surprising.[↩]
  5. Get it??[↩]