class: center, middle, inverse, title-slide # Level Up 02: Manipulating Factors --- class: inverse # SeminRs - Informal, optional weekly sessions to help build a 'portfolio of skills' - 1-2 hours of instruction, demos, walk throughs & activities to try out ### *Essentials* - Focused on the fundRmental skills to help you get started with learning R - Covering: basic wrangling & visualising data, '.bold.pink[pretty]' R Markdown, inline code, debugging... ### *Level Up* - Focused on more advanced programming skills & applying these skills to new .bold.orange[fun] topics - Covering: Papaja, advanced wrangling & manipulation of data, '.bold.italic.pink[even prettier]' R Markdown, spotifyR... <br> *Session topics are not fixed - use the Padlet linked on Canvas for suggestions!* --- class: inverse # Setup & Suggested Workflow - Create one R project file for all seminR sessions - Within this directory, create an r_docs & data folder & save all Rmds and datasets to these folders respectively - Make a cheat sheet of useful functions and .orange[#] comment their meaning and usage as you go through seminRs, practicals, tutorials etc. #### .orange[Reminder]: File > New Project... New Directory > New Project > *Give your project a name & location* File > New File > R Markdown... *Remember to save this file in your r_docs folder!* ### .orange[Task]: Open your seminRs project, create a new Rmd file & load tidyverse & palmerpenguins --- class: inverse # Session Objectives ### Wrangling Factors with forcats - Understand factors & categorical variables - Learn & practise some common functions (from base R, dplyr, forcats) to manipulate factors <br> <br> <br> To check your answers or follow along, [download the solutions Rmd](https://and.netlify.app/seminr/02/level_up/levelup_02.Rmd) --- class: inverse # Factors & Categorical Data - Factors represent categorical data - Numeric values are used to represent different levels - They have a fixed, known set of possible values - forcats is a useful package when working with factors, loads with library(tidyverse) or library(forcats) ### Examples - Marital status - Occupation - Education level - Eye colour - Pretty much anything... --- class: inverse # Working with Factors in R ### .orange[Task 1]: Read in the dataset below ```r spotify_data <- readr::read_csv("../data/spotify_decades_data.csv") # if saved already spotify_data <- readr::read_csv("https://raw.githubusercontent.com/de84sussex/DS_spotify/main/spotify_decades_data.csv") ``` ### .orange[Task 2]: Explore the dataset, is there anything odd or unusual? .orange[Hint: look at the structure/column specification of the tibble] --- class: inverse # Creating/Converting Factors IMO, easiest way to convert from an existing variable is to use a combination of factor() from base R and mutate() from dplyr #### Step 1: add a pipe following reading the data into mutate() #### Step 2: specify the *new* column name #### Step 3: use factor() on the column you want to change, specify the labels for each category ```r spotify_data <- readr::read_csv("../data/spotify_decades_data.csv") %>% mutate(decade_fct = factor(decade_fct, labels = c("70s", "80s", "90s", "00s"))) ``` ### .orange[Task]: Follow these steps and update your code to read in decade_fct as a factor --- class: inverse # Inspecting Factors ### Base R class() for seeing/checking the data type is correct levels() to see the levels of a factor table() for table of groups & counts ### forcats fct_count() gives factor level and counts, similar to table() ### .orange[Task]: Try these functions on our decade_fct variable, remember we subset a variable by using the .orange[$] dataset$column --- class: inverse # Filter For Demo <br> <br> <br> ```r # adding in filter to see differences between groups more clearly in following tasks spotify_data <- spotify_data %>% filter(.data = ., track.popularity > 50 & track.explicit == FALSE) ``` --- class: inverse # Changing Factor Ordering fct_relevel() for reordering the factor levels, useful if factors are ordered (i.e. low, med, high) and they're originally set up inappropriately (i.e. low = 1, high = 2, med = 3) ```r dataset$column <- fct_relevel(dataset$column, c("low", "medium", "high")) ``` fct_infreq() reorders the levels so the most frequent category becomes level 1, second most frequent becomes 2, and so on, useful with plots ```r plot <- ggplot(data, aes(x = fct_infreq(categorical_variable))) + geom_bar() + coord_flip() ``` ### .orange[Task]: Relevel the decade_fct variable so that 00s is level 1, 90s is level 2, 80s is level 3, and 70s is level 4 Create a plot of decade_fct which is reordered so the bars are displayed in order of their frequency --- class: inverse # Changing Factor Levels fct_recode() changes the level names/values ```r dataset$column <- fct_recode(dataset$column, new_name_1 = "old_name_1", new_name_2 = "old_name_2", new_name_3 = "old_name_3", new_name_4 = "old_name_4") ``` fct_collapse() collapses multiple levels into one ```r dataset$column <- fct_collapse(dataset$column, new_name_1 = c("old_name_1", "old_name_2"), new_name_2 = c("old_name_3", "old_name_4")) ``` ### .orange[Task]: Recode the levels so that 00s is naughties, 90s is nineties, 80s is eighties, 70s is seventies Collapse the levels so that seventies & eighties are called oldies, & nineties & naughties are called newbies --- class: inverse # For extRa fun Try using some of the commands we've used today to wrangle another dataset! ```r peng_data <- palmerpenguins::penguins # to load the data ``` ### .orange[Task]: 1. Inspect the species factor using class(), levels(), & fct_count() 1. fct_reorder() the levels of species so that the species with the highest count is level 1, and the species with lowest count is level 3 (you'll need to look at the output from fct_count to find out the *n*s & which order to put them into fct_relevel()) 1. Google images of the 3 species, decide which one is cute, which one is cuter, and which one is the cutest 1. fct_recode() the levels to reflect your cuteness ratings given to the different species 1. fct_collapse the cute and cuter levels into one category called just_cute --- class: center, middle <div class="padlet-embed" style="border:1px solid rgba(0,0,0,0.1);border-radius:2px;box-sizing:border-box;overflow:hidden;position:relative;width:100%;background:#F4F4F4"><p style="padding:0;margin:0"><iframe src="https://uofsussex.padlet.org/embed/nrud4gk8x63gbfdc" frameborder="0" allow="camera;microphone;geolocation" style="width:100%;height:608px;display:block;padding:0;margin:0"></iframe></p><div style="padding:8px;text-align:right;margin:0;"><a href="https://padlet.com?ref=embed" style="padding:0;margin:0;border:none;display:block;line-height:1;height:16px" target="_blank"><img src="https://padlet.net/embeds/made_with_padlet.png" width="86" height="16" style="padding:0;margin:0;background:none;border:none;display:inline;box-shadow:none" alt="Made with Padlet"></a></div></div>