Analysing Data: LM 4: Bringing It All Together

The End Is In Sight

To finish off the module in style, we have a very special practical for you today.

The snazzy version of today’s practical is an escape room. You will need to use your R skills and stats savvy to beat all the tasks - but the clock is ticking!

Go to the escape room

If you’d prefer just to work on the tasks directly, you can do this worksheet instead. It has essentially the same tasks, just without the atmosphere and the chance for eternal Analysing Data glory.

Setup

Task 1

Set up as usual:

Create a project for this week
Open a new RMarkdown document to work in and save it in your project folder
Load tidyverse

Task 2

Read in the data at the following link and have a look at it.

LINK: https://and.netlify.app/docs/prac_11_data.csv

data <- readr::read_csv("https://and.netlify.app/docs/prac_11_data.csv")

Data Cleaning

Before you begin, you’ll need to clean up your data.

Task 3

Complete the following tasks in order.

Task 3.1

Remove all incomplete rows (i.e. any containing NAs)

Task 3.2

Calculate the number of cases in each group of condition.

Task 3.3

Remove all cases from the group with the smallest number of cases.

Task 3.4

Get R to print out the following values:

The minimum value of stats_anx.
The mean of trait_anx.
The SD of fear_tech.
The maximum value of solve_time.
The total number of cases in the clean dataset.

We’ve done all these tasks in previous practicals near the start of term. Have a look there if you’re stuck!

## base R solution
data <- data[complete.cases(data),]

## short tidyverse solution (from Google)
data <- data %>% 
  tidyr::drop_na()

## longer tidyverse solution using lots of filters
data <- data %>% 
  dplyr::filter(!is.na(solve_time) & !is.na(courage) 
                & !is.na(fear_tech) & !is.na(stats_anx) 
                & !is.na(trait_anx))

## Find the name of the smallest group
smallest_group <- data %>% 
    dplyr::group_by(condition) %>% 
    dplyr::summarise(n = n()) %>% 
    dplyr::filter(n == min(n)) %>% 
    dplyr::pull(condition)

## Remove all cases from that group
data <- data %>% 
  dplyr::filter(condition != smallest_group)

## Quick summary tibble
data %>% 
  dplyr::summarise(
    ans1 = min(stats_anx),
    ans2 = mean(trait_anx),
    ans3 = sd(fear_tech),
    ans4 = max(solve_time),
    ans5 = nrow(data)
  )

# A tibble: 1 x 5
   ans1  ans2  ans3  ans4  ans5
  <dbl> <dbl> <dbl> <dbl> <int>
1 -6.17  3.71  14.7 -8.49   166

Directionality

Task 4

Use any method you like to find out the direction of the relationship between each of the five pairs of variables listed below. Make an note of each in your document.

solve_time and fear_tech
stats_anx and trait_anx
courage and stats_anx
fear_tech and trait_anx
solve_time and courage

Any method you like will do - either plots or numbers.

[1] "Positive" "Positive" "Positive" "Negative" "Positive"

Model Comparison

Task 5

Compare models to each other to find out which best explains the variance in the outcome.

Task 5.1

Construct three models containing the following variables:

Null model: the outcome, solve_time
Model 1: the outcome, solve_time, and predictors courage, fear_tech
Model 2: the outcome, solve_time, and predictors courage, fear_tech, stats_anx, trait_anx

Task 5.2

Compare each model to the one before it and make a note of the following information:

Does model 1 explain significantly more variance than the null model?
What is the F-statistic for this comparison (null vs model 1)?
Does model 2 explain significantly more variance than model 1?
What is the F-statistic for this comparison (model 1 vs model 2)?
Which model best explains the variance in the outcome?

[1] TRUE

[1] 89.39

[1] TRUE

[1] 7.73

[1] "Model 2"

Using the Model

Task 6

Use the linear model equation to estimate the outcome for each of the following sets of values.

courage = 11.6, fear_tech = -21.64, stats_anx = -3.29, trait_anx = 3.19
courage = 10.53, fear_tech = -7.33, stats_anx = -3.67, trait_anx = 2.76
courage = 13.12, fear_tech = 2.92, stats_anx = -3.35, trait_anx = 2.79

Make sure you have replaced all bs with their correct, unrounded values from the output, then for each question, replace the variables with the given values. Only round at the end!

[1] -22.15

[1] -23.3

[1] -18.21

Interpreting the Output

Task 7

Find the following values:

The estimate of the relationship between stats_anx and solve_time
The estimated value of solve_time when all of the predictors are 0
The squared correlation between the values predicted by the model, and the observed values
The ratio of the size of the b estimate for courage compared to its standard error
The ratio of the variance the model explains compared to the variance left over after fitting the model
The name of the predictor with the strongest effect on solve_time

Remember that you can’t compare unstandardised bs.

stats_anx 
     1.85

(Intercept) 
     -28.52

[1] 0.55

[1] 7.59

value 
48.56

[1] "courage"

You Made It!

Very well done for all your hard work this term. If you were able to solve all these tasks, you’re in great shape for the exam already.

See you next year!