+ - 0:00:00
Notes for current slide
Notes for next slide

Level Up 02: Manipulating Factors

1 / 13

SeminRs

  • Informal, optional weekly sessions to help build a 'portfolio of skills'

  • 1-2 hours of instruction, demos, walk throughs & activities to try out

Essentials

  • Focused on the fundRmental skills to help you get started with learning R
  • Covering: basic wrangling & visualising data, 'pretty' R Markdown, inline code, debugging...

Level Up

  • Focused on more advanced programming skills & applying these skills to new fun topics
  • Covering: Papaja, advanced wrangling & manipulation of data, 'even prettier' R Markdown, spotifyR...


Session topics are not fixed - use the Padlet linked on Canvas for suggestions!

2 / 13

Setup & Suggested Workflow

  • Create one R project file for all seminR sessions

  • Within this directory, create an r_docs & data folder & save all Rmds and datasets to these folders respectively

  • Make a cheat sheet of useful functions and # comment their meaning and usage as you go through seminRs, practicals, tutorials etc.

Reminder:

File > New Project... New Directory > New Project > Give your project a name & location

File > New File > R Markdown... Remember to save this file in your r_docs folder!

Task:

Open your seminRs project, create a new Rmd file & load tidyverse & palmerpenguins

3 / 13

Session Objectives

Wrangling Factors with forcats

  • Understand factors & categorical variables

  • Learn & practise some common functions (from base R, dplyr, forcats) to manipulate factors




To check your answers or follow along, download the solutions Rmd

4 / 13

Factors & Categorical Data

  • Factors represent categorical data
  • Numeric values are used to represent different levels
  • They have a fixed, known set of possible values
  • forcats is a useful package when working with factors, loads with library(tidyverse) or library(forcats)

Examples

  • Marital status
  • Occupation
  • Education level
  • Eye colour
  • Pretty much anything...
5 / 13

Working with Factors in R

Task 1:

Read in the dataset below

spotify_data <- readr::read_csv("../data/spotify_decades_data.csv") # if saved already
spotify_data <- readr::read_csv("https://raw.githubusercontent.com/de84sussex/DS_spotify/main/spotify_decades_data.csv")

Task 2:

Explore the dataset, is there anything odd or unusual?

Hint: look at the structure/column specification of the tibble

6 / 13

Creating/Converting Factors

IMO, easiest way to convert from an existing variable is to use a combination of factor() from base R and mutate() from dplyr

Step 1: add a pipe following reading the data into mutate()

Step 2: specify the new column name

Step 3: use factor() on the column you want to change, specify the labels for each category

spotify_data <- readr::read_csv("../data/spotify_decades_data.csv") %>%
mutate(decade_fct =
factor(decade_fct,
labels = c("70s", "80s", "90s", "00s")))

Task:

Follow these steps and update your code to read in decade_fct as a factor

7 / 13

Inspecting Factors

Base R

class() for seeing/checking the data type is correct

levels() to see the levels of a factor

table() for table of groups & counts

forcats

fct_count() gives factor level and counts, similar to table()

Task:

Try these functions on our decade_fct variable, remember we subset a variable by using the $ dataset$column

8 / 13

Filter For Demo




# adding in filter to see differences between groups more clearly in following tasks
spotify_data <- spotify_data %>% filter(.data = ., track.popularity > 50 & track.explicit == FALSE)
9 / 13

Changing Factor Ordering

fct_relevel() for reordering the factor levels, useful if factors are ordered (i.e. low, med, high) and they're originally set up inappropriately (i.e. low = 1, high = 2, med = 3)

dataset$column <- fct_relevel(dataset$column, c("low", "medium", "high"))

fct_infreq() reorders the levels so the most frequent category becomes level 1, second most frequent becomes 2, and so on, useful with plots

plot <- ggplot(data, aes(x = fct_infreq(categorical_variable))) +
geom_bar() +
coord_flip()

Task:

Relevel the decade_fct variable so that 00s is level 1, 90s is level 2, 80s is level 3, and 70s is level 4
Create a plot of decade_fct which is reordered so the bars are displayed in order of their frequency

10 / 13

Changing Factor Levels

fct_recode() changes the level names/values

dataset$column <- fct_recode(dataset$column, new_name_1 = "old_name_1", new_name_2 = "old_name_2", new_name_3 = "old_name_3", new_name_4 = "old_name_4")

fct_collapse() collapses multiple levels into one

dataset$column <- fct_collapse(dataset$column, new_name_1 = c("old_name_1", "old_name_2"), new_name_2 = c("old_name_3", "old_name_4"))

Task:

Recode the levels so that 00s is naughties, 90s is nineties, 80s is eighties, 70s is seventies
Collapse the levels so that seventies & eighties are called oldies, & nineties & naughties are called newbies

11 / 13

For extRa fun

Try using some of the commands we've used today to wrangle another dataset!

peng_data <- palmerpenguins::penguins # to load the data

Task:

  1. Inspect the species factor using class(), levels(), & fct_count()
  2. fct_reorder() the levels of species so that the species with the highest count is level 1, and the species with lowest count is level 3 (you'll need to look at the output from fct_count to find out the ns & which order to put them into fct_relevel())
  3. Google images of the 3 species, decide which one is cute, which one is cuter, and which one is the cutest
  4. fct_recode() the levels to reflect your cuteness ratings given to the different species
  5. fct_collapse the cute and cuter levels into one category called just_cute
12 / 13

Made with Padlet
13 / 13

SeminRs

  • Informal, optional weekly sessions to help build a 'portfolio of skills'

  • 1-2 hours of instruction, demos, walk throughs & activities to try out

Essentials

  • Focused on the fundRmental skills to help you get started with learning R
  • Covering: basic wrangling & visualising data, 'pretty' R Markdown, inline code, debugging...

Level Up

  • Focused on more advanced programming skills & applying these skills to new fun topics
  • Covering: Papaja, advanced wrangling & manipulation of data, 'even prettier' R Markdown, spotifyR...


Session topics are not fixed - use the Padlet linked on Canvas for suggestions!

2 / 13
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow