Informal, optional weekly sessions to help build a 'portfolio of skills'
1-2 hours of instruction, demos, walk throughs & activities to try out
Session topics are not fixed - use the Padlet linked on Canvas for suggestions!
Create one R project file for all seminR sessions
Within this directory, create an r_docs & data folder & save all Rmds and datasets to these folders respectively
Make a cheat sheet of useful functions and # comment their meaning and usage as you go through seminRs, practicals, tutorials etc.
File > New Project... New Directory > New Project > Give your project a name & location
File > New File > R Markdown... Remember to save this file in your r_docs folder!
Open your seminRs project, create a new Rmd file, & read in the data
spotify_data <- readr::read_csv("../data/spotify_decades_data.csv") spotify_data <- readr::read_csv("https://raw.githubusercontent.com/de84sussex/DS_spotify/main/spotify_decades_data.csv")
Understand pipes %>% & how to use them
Use different dplyr functions to perform basic data wrangling
To check your answers or follow along, download the solutions Rmd
Part of the magrittr package, loads with library(tidyverse) or library(magrittr)
f(x)
f(x, y)
h(g(f(x)))
nested_smry <- dplyr::mutate( dplyr::summarise( dplyr::group_by( penguins, species), m_mass = mean(body_mass_g, na.rm = T), sd = sd(body_mass_g, na.rm = T), n = n()), se = sd/sqrt(n))
x %>% f
x %>% f(y)
x %>% f %>% g %>% h
piped_smry <- penguins %>% dplyr::group_by(.data = ., species) %>% dplyr::summarise(.data = ., m_mass = mean(body_mass_g, na.rm = T), sd = sd(body_mass_g, na.rm = T), n = n()) %>% dplyr::mutate(se = sd/sqrt(n))
. is used as a placeholder for function arguments
summary <- some_data %>% dplyr::group_by(.data = ., a_category) %>% ...
The first argument for group_by() is the data .data = some_data
When using the pipe, we don't need to specify this argument because it takes the output of the step before it
It's good practice to specify the arguments of functions so we use a . to reflect the piped input
piped_smry_1 <- penguins %>% dplyr::group_by(.data = ., species) %>% ...# Orpiped_smry_2 <- penguins %>% dplyr::group_by(., species) %>% ...
We've already covered:
We're going to practise these & some additional dplyr functions:
output_1 <- dplyr::select(.data = data, column_1) # to select column_1output_2 <- dplyr::select(.data = data, -column_1, -column2) # to remove column_1 & column_2output_3 <- data %>% dplyr::select(.data = ., column_1) # to use with %>%
Create a new code chunk, & using the spotify_data you loaded in at the start, remove the song_id, decade_fct, is_local columns & assign it back to the spotify_data object
Hint: Remember to load tidyverse/dplyr to use these functions & use names() to easily see the names of the columns!
output_1 <- dplyr::filter(.data = data, column_1 == "some text") # keep rows = to some textoutput_2 <- dplyr::filter(.data = data, column_1 != "some text") # keep rows not = to some textoutput_3 <- dplyr::filter(.data = data, column_2 < 50) # keep rows where value is smaller than 50output_4 <- dplyr::filter(.data = data, column_3 < 2 & column_4 == FALSE) # keep rows that meet BOTH conditions output_5 <- data %>% dplyr::filter(.data = ., column_1 == "some text") # to use with %>%
Using the spotify_data, filter tracks where both popularity is above 50 & they are not explicit (i.e., FALSE) & assign it back to the spotify_data object
output_1 <- dplyr::mutate(.data = data, new_column = old_column*500) # times values by 500output_2 <- dplyr::mutate(.data = data, new_column = old_column^4) # power of 4output_3 <- data %>% dplyr::mutate(.data = ., new_column = old_column/2) # to use with %>%
Using the spotify_data, create a new column called 'duration_secs' by calculating the track duration in seconds from the track.duration_ms column & assign it back to the spotify_data object
Extra task: create a duration_mins column with the track duration converted to minutes
output <- data %>% summarise(.data = ., col_name_1 = mean(column), col_name_2 = sd(column) )
Edit the example code above to create a summary table of the spotify_data of the mean, sd, min & max values for track.popularity, name the new summary table object pop_smry
output <- data %>% group_by(.data = ., categorical_column) %>% summarise(.data = ., col_name_1 = mean(column), col_name_2 = sd(column) )
Edit the pop_smry object you created in the previous task to be grouped by playlist_name
output <- rename(.data = data, new_column_name = old_column_name)output <- data %>% rename(.data = ., new_column_name = old_column_name)
Using spotify_data, rename the playlist_name variable to be called decade & assign it back to the spotify_data object
output <- pull(.data = data, column_1)output <- data %>% pull(.data = ., column_1)
Create a new object called songs, which consists of the track.name from spotify_data
Try putting all the commands we've used today into one long pipe!
All these commands don't make much sense to have in one pipe, but just for pRactise :)
Informal, optional weekly sessions to help build a 'portfolio of skills'
1-2 hours of instruction, demos, walk throughs & activities to try out
Session topics are not fixed - use the Padlet linked on Canvas for suggestions!
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |