Informal, optional weekly sessions to help build a 'portfolio of skills'
1-2 hours of instruction, demos, walk throughs & activities to try out
Session topics are not fixed - use the Padlet linked on Canvas for suggestions!
Create an R project file for all seminR sessions
Within this directory, create an r_docs & data folder
Save all Rmds and datasets to these folders respectively
File > New Project... New Directory > New Project > Give your project a name & location
Open a new Rmd file for today's session
Make a cheat sheet of useful functions and # comment their meaning and usage as you go through seminRs, practicals, tutorials etc.
Understand directories
Read in datasets
Use different functions to explore datasets
Understand different data types
Reading in data requires knowledge of your files & folders
Directories = Folders
Paths = Directions
Two types of paths:
C:/Users/danie/Documents/seminRs_21/images/image.png
./images/image.png
Reading in data requires knowledge of your files & folders
Directories = Folders
Paths = Directions
Two types of paths:
C:/Users/danie/Documents/seminRs_21/images/image.png
./images/image.png
Collection of data, columns represent variables, & rows represent cases/people/entities
Lots of different data formats exist - common ones are: .csv, .sav, .Rdata...
Different functions, packages, & methods we can use to load them in R
We're going to focus on readr, others include haven, foreign, readxl
Part of the tidyverse, can be loaded using library(tidyverse) or library(readr)
readr::read_csv("./data/spotify_decades_data.csv") data <- readr::read_csv("../spotify_decades_data.csv")
First argument is the path to the file
Once it's run, it prints out column specification which gives the column names and data types
https://raw.githubusercontent.com/de84sussex/DS_spotify/main/spotify_decades_data.csv
The things we can do with our variables depends on their data type - get to know your data!
Below are some useful functions to explore your data, run these only in the console or make sure to delete them out of your Rmd file after you've seen the output
All you need to do is put the name of your object (aka your dataset), into the brackets of the function
To see your raw data:
objectname + ctrl/command & enter
print()
View()
head()
tail()
To see characteristics of your data:
str() & dplyr::glimpse()
summary()
class()
names()
ncol()
nrow()
Using a mix of these functions:
Find out & write on the Padlet:
View(data) data print(data)
ncol(data)
## [1] 18
nrow(data)
## [1] 60
str(data)
## tibble [60 x 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)## $ song_id : num [1:60] 1 2 3 4 5 6 7 8 9 10 ...## $ playlist_name : chr [1:60] "decades_70" "decades_70" "decades_70" "decades_70" ...## $ decade_fct : num [1:60] 1 1 1 1 1 1 1 1 1 1 ...## $ track_artists : chr [1:60] "Elton John" "Stevie Wonder" "Eric Clapton" "Chaka Khan" ...## $ track.name : chr [1:60] "Rocket Man (I Think It's Going To Be A Long, Long Time)" "Signed, Sealed, Delivered (I'm Yours)" "Wonderful Tonight" "I'm Every Woman" ...## $ danceability : num [1:60] 0.601 0.67 0.572 0.617 0.838 0.808 0.579 0.631 0.482 0.7 ...## $ energy : num [1:60] 0.532 0.619 0.214 0.879 0.806 0.535 0.508 0.59 0.835 0.816 ...## $ loudness : num [1:60] -9.12 -10.37 -15.62 -7.56 -9.74 ...## $ speechiness : num [1:60] 0.0286 0.0323 0.0293 0.0455 0.0408 0.0353 0.027 0.0297 0.0539 0.044 ...## $ acousticness : num [1:60] 0.432 0.0514 0.649 0.127 0.213 0.179 0.00574 0.00367 0.0191 0.00115 ...## $ liveness : num [1:60] 0.0925 0.0492 0.125 0.339 0.354 0.158 0.0575 0.0537 0.162 0.0901 ...## $ tempo : num [1:60] 136.6 108.8 95.5 114.5 123.1 ...## $ instrumentalness : num [1:60] 6.25e-06 0.00 1.29e-01 5.71e-05 2.03e-03 9.91e-05 4.94e-04 2.99e-03 1.43e-02 1.23e-03 ...## $ valence : num [1:60] 0.341 0.807 0.485 0.746 0.846 0.848 0.609 0.927 0.776 0.838 ...## $ track.popularity : num [1:60] 81 1 76 58 0 72 62 66 66 63 ...## $ track.duration_ms: num [1:60] 281613 160500 225026 247413 361093 ...## $ track.explicit : logi [1:60] FALSE FALSE FALSE FALSE FALSE FALSE ...## $ is_local : logi [1:60] FALSE FALSE FALSE FALSE FALSE FALSE ...## - attr(*, "spec")=## .. cols(## .. song_id = col_double(),## .. playlist_name = col_character(),## .. decade_fct = col_double(),## .. track_artists = col_character(),## .. track.name = col_character(),## .. danceability = col_double(),## .. energy = col_double(),## .. loudness = col_double(),## .. speechiness = col_double(),## .. acousticness = col_double(),## .. liveness = col_double(),## .. tempo = col_double(),## .. instrumentalness = col_double(),## .. valence = col_double(),## .. track.popularity = col_double(),## .. track.duration_ms = col_double(),## .. track.explicit = col_logical(),## .. is_local = col_logical()## .. )
head(data)
## # A tibble: 6 x 18## song_id playlist_name decade_fct track_artists track.name danceability energy## <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl>## 1 1 decades_70 1 Elton John Rocket Ma~ 0.601 0.532## 2 2 decades_70 1 Stevie Wonder Signed, S~ 0.67 0.619## 3 3 decades_70 1 Eric Clapton Wonderful~ 0.572 0.214## 4 4 decades_70 1 Chaka Khan I'm Every~ 0.617 0.879## 5 5 decades_70 1 Marvin Gaye Got To Gi~ 0.838 0.806## 6 6 decades_70 1 Michael Jack~ Rock with~ 0.808 0.535## # ... with 11 more variables: loudness <dbl>, speechiness <dbl>,## # acousticness <dbl>, liveness <dbl>, tempo <dbl>, instrumentalness <dbl>,## # valence <dbl>, track.popularity <dbl>, track.duration_ms <dbl>,## # track.explicit <lgl>, is_local <lgl>
tail(data, 10) # this changes to the last 10
## # A tibble: 10 x 18## song_id playlist_name decade_fct track_artists track.name danceability energy## <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl>## 1 51 decades_00 4 Usher, Lil J~ Yeah! 0.895 0.795## 2 52 decades_00 4 Rihanna Pon de Re~ 0.779 0.64 ## 3 53 decades_00 4 Toni Braxton He Wasn't~ 0.739 0.947## 4 54 decades_00 4 Ja Rule, Ash~ Always On~ 0.839 0.706## 5 55 decades_00 4 Black Eyed P~ I Gotta F~ 0.743 0.766## 6 56 decades_00 4 Kanye West Through T~ 0.571 0.739## 7 57 decades_00 4 Kings of Leon Sex on Fi~ 0.544 0.903## 8 58 decades_00 4 Jennifer Lop~ I'm Real ~ 0.708 0.587## 9 59 decades_00 4 Britney Spea~ Oops!...I~ 0.751 0.834## 10 60 decades_00 4 Beyonce, Sea~ Baby Boy ~ 0.655 0.488## # ... with 11 more variables: loudness <dbl>, speechiness <dbl>,## # acousticness <dbl>, liveness <dbl>, tempo <dbl>, instrumentalness <dbl>,## # valence <dbl>, track.popularity <dbl>, track.duration_ms <dbl>,## # track.explicit <lgl>, is_local <lgl>
names(data)
## [1] "song_id" "playlist_name" "decade_fct" ## [4] "track_artists" "track.name" "danceability" ## [7] "energy" "loudness" "speechiness" ## [10] "acousticness" "liveness" "tempo" ## [13] "instrumentalness" "valence" "track.popularity" ## [16] "track.duration_ms" "track.explicit" "is_local"
summary(data)
## song_id playlist_name decade_fct track_artists ## Min. : 1.00 Length:60 Min. :1.00 Length:60 ## 1st Qu.:15.75 Class :character 1st Qu.:1.75 Class :character ## Median :30.50 Mode :character Median :2.50 Mode :character ## Mean :30.50 Mean :2.50 ## 3rd Qu.:45.25 3rd Qu.:3.25 ## Max. :60.00 Max. :4.00 ## track.name danceability energy loudness ## Length:60 Min. :0.4090 Min. :0.2140 Min. :-15.625 ## Class :character 1st Qu.:0.5955 1st Qu.:0.5807 1st Qu.: -9.703 ## Mode :character Median :0.7020 Median :0.7175 Median : -7.403 ## Mean :0.6847 Mean :0.6885 Mean : -7.844 ## 3rd Qu.:0.7705 3rd Qu.:0.8295 3rd Qu.: -5.649 ## Max. :0.9200 Max. :0.9470 Max. : -1.915 ## speechiness acousticness liveness tempo ## Min. :0.02610 Min. :0.000154 Min. :0.03270 Min. : 74.38 ## 1st Qu.:0.03238 1st Qu.:0.023175 1st Qu.:0.08518 1st Qu.: 95.05 ## Median :0.04130 Median :0.087450 Median :0.11800 Median :104.64 ## Mean :0.07131 Mean :0.145696 Mean :0.17277 Mean :108.64 ## 3rd Qu.:0.06588 3rd Qu.:0.197250 3rd Qu.:0.25450 3rd Qu.:119.11 ## Max. :0.33200 Max. :0.649000 Max. :0.69600 Max. :174.43 ## instrumentalness valence track.popularity track.duration_ms## Min. :0.000000 Min. :0.1610 Min. : 0.00 Min. :160500 ## 1st Qu.:0.000000 1st Qu.:0.5690 1st Qu.:58.00 1st Qu.:225856 ## Median :0.000041 Median :0.7460 Median :66.50 Median :244607 ## Mean :0.014627 Mean :0.6993 Mean :57.85 Mean :251161 ## 3rd Qu.:0.000934 3rd Qu.:0.8462 3rd Qu.:75.00 3rd Qu.:269540 ## Max. :0.429000 Max. :0.9800 Max. :83.00 Max. :391376 ## track.explicit is_local ## Mode :logical Mode :logical ## FALSE:55 FALSE:60 ## TRUE :5 ## ## ##
str(data)
Or
class(data$track.popularity)
## [1] "numeric"
Informal, optional weekly sessions to help build a 'portfolio of skills'
1-2 hours of instruction, demos, walk throughs & activities to try out
Session topics are not fixed - use the Padlet linked on Canvas for suggestions!
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |