5
Tidy Data Science with the Tidyverse and Tidymodels is licensed under a Creative Commons Attribution 4.0 International License.
02:00
02:00
Workflow
Workflow
Product
Workflow
Product
Each analysis as a project
R scripts written with assumption of:
Creates everything it needs, touches nothing it didn't create.
Can move directory on computer, can move to different computer, can be used by other person (including future you!)
It’s like agreeing that we will all drive on the left or the right. A hallmark of civilization is following conventions that constrain your behavior a little, in the name of public safety.
Artwork by @allison_horst
here()
Find the project directory and build file paths.
library(here)here()#> [1] "/Users/jakethompson/Documents/GIT/courses/tidyds-2021"here("materials", "data", "nimbus.csv")#> [1] "/Users/jakethompson/Documents/GIT/courses/tidyds-2021/materials/data/nimbus.csv"
here()
Where does here()
start?
Is a file named .here
present?
Is there a .Rproj
file (e.g., tidyds-2021.Rproj
)?
Is there a .git
or .svn
directory?
dr_here()#> here() starts at /Users/jakethompson/Documents/GIT/courses/tidyds-2021.#> - This directory contains a file matching "[.]Rproj$" with contents matching "^Version: " in the first line#> - Initial working directory: /Users/jakethompson/Documents/GIT/courses/tidyds-2021/site/static/slides#> - Current working directory: /Users/jakethompson/Documents/GIT/courses/tidyds-2021/site/static/slides
readr
functionsfunction | extracts |
---|---|
read_csv() | comma separated files |
read_csv2() | semi-colon separated files |
read_delim() | general delimited files |
read_fwf() | fixed width files |
read_log() | Apache log files |
read_table() | space separated files |
read_tsv() | tab separated files |
readr
functionsfunction | extracts |
---|---|
read_csv() | comma separated files |
read_csv2() | semi-colon separated files |
read_delim() | general delimited files |
read_fwf() | fixed width files |
read_log() | Apache log files |
read_table() | space separated files |
read_tsv() | tab separated files |
nimbus
#> date,longitude,latitude,ozone#> 1985-10-01T00:00:00Z,-179.375,-73.5,302#> 1985-10-01T00:00:00Z,-178.125,-73.5,302#> 1985-10-01T00:00:00Z,-176.875,-73.5,302#> 1985-10-01T00:00:00Z,-175.625,-73.5,302#> 1985-10-01T00:00:00Z,-174.375,-73.5,304#> 1985-10-01T00:00:00Z,-173.125,-73.5,304#> 1985-10-01T00:00:00Z,-171.875,-73.5,304#> 1985-10-01T00:00:00Z,-170.625,-73.5,304#> 1985-10-01T00:00:00Z,-164.375,-73.5,287
read_csv()
readr
functions share a common syntax.
dat <- read_csv("path/to/file.csv", ...)
read_csv()
readr
functions share a common syntax.
dat <- read_csv("path/to/file.csv", ...)
object to save data to
read_csv()
readr
functions share a common syntax.
dat <- read_csv("path/to/file.csv", ...)
read_csv()
readr
functions share a common syntax.
dat <- read_csv(here("path", "to", "file.csv"), ...)
build file path with here()
Find nimbus.csv
in your project directory
Read it into an object
View the results
02:00
nimbus <- read_csv(here("materials", "data", "nimbus.csv"))#> #> ── Column specification ────────────────────────────────────────────────────────#> cols(#> date = col_datetime(format = ""),#> longitude = col_double(),#> latitude = col_double(),#> ozone = col_character()#> )nimbus#> # A tibble: 18,963 x 4#> date longitude latitude ozone#> <dttm> <dbl> <dbl> <chr>#> 1 1985-10-01 00:00:00 -179. -73.5 302 #> 2 1985-10-01 00:00:00 -178. -73.5 302 #> 3 1985-10-01 00:00:00 -177. -73.5 302 #> 4 1985-10-01 00:00:00 -176. -73.5 302 #> 5 1985-10-01 00:00:00 -174. -73.5 304 #> 6 1985-10-01 00:00:00 -173. -73.5 304 #> 7 1985-10-01 00:00:00 -172. -73.5 304 #> 8 1985-10-01 00:00:00 -171. -73.5 304 #> 9 1985-10-01 00:00:00 -164. -73.5 287 #> 10 1985-10-01 00:00:00 -163. -73.5 287 #> # … with 18,953 more rows
read.csv()
vs. read_csv()
#> #> 18939 1985-10-01T00:00:00Z 139.375 -0.5 270#> 18940 1985-10-01T00:00:00Z 140.625 -0.5 275#> 18941 1985-10-01T00:00:00Z 141.875 -0.5 270#> 18942 1985-10-01T00:00:00Z 143.125 -0.5 266#> 18943 1985-10-01T00:00:00Z 144.375 -0.5 267#> 18944 1985-10-01T00:00:00Z 145.625 -0.5 263#> 18945 1985-10-01T00:00:00Z 146.875 -0.5 261#> 18946 1985-10-01T00:00:00Z 148.125 -0.5 262#> 18947 1985-10-01T00:00:00Z 154.375 -0.5 271#> 18948 1985-10-01T00:00:00Z 155.625 -0.5 272#> 18949 1985-10-01T00:00:00Z 156.875 -0.5 268#> 18950 1985-10-01T00:00:00Z 158.125 -0.5 276#> 18951 1985-10-01T00:00:00Z 159.375 -0.5 273#> 18952 1985-10-01T00:00:00Z 160.625 -0.5 272#> 18953 1985-10-01T00:00:00Z 161.875 -0.5 271#> 18954 1985-10-01T00:00:00Z 163.125 -0.5 272#> 18955 1985-10-01T00:00:00Z 164.375 -0.5 275#> 18956 1985-10-01T00:00:00Z 165.625 -0.5 271#> 18957 1985-10-01T00:00:00Z 166.875 -0.5 271#> 18958 1985-10-01T00:00:00Z 168.125 -0.5 273#> 18959 1985-10-01T00:00:00Z 169.375 -0.5 273#> 18960 1985-10-01T00:00:00Z 170.625 -0.5 271#> 18961 1985-10-01T00:00:00Z 171.875 -0.5 270#> 18962 1985-10-01T00:00:00Z 173.125 -0.5 268#> 18963 1985-10-01T00:00:00Z 174.375 -0.5 265
read.csv()
vs. read_csv()
#> #> 18939 1985-10-01T00:00:00Z 139.375 -0.5 270#> 18940 1985-10-01T00:00:00Z 140.625 -0.5 275#> 18941 1985-10-01T00:00:00Z 141.875 -0.5 270#> 18942 1985-10-01T00:00:00Z 143.125 -0.5 266#> 18943 1985-10-01T00:00:00Z 144.375 -0.5 267#> 18944 1985-10-01T00:00:00Z 145.625 -0.5 263#> 18945 1985-10-01T00:00:00Z 146.875 -0.5 261#> 18946 1985-10-01T00:00:00Z 148.125 -0.5 262#> 18947 1985-10-01T00:00:00Z 154.375 -0.5 271#> 18948 1985-10-01T00:00:00Z 155.625 -0.5 272#> 18949 1985-10-01T00:00:00Z 156.875 -0.5 268#> 18950 1985-10-01T00:00:00Z 158.125 -0.5 276#> 18951 1985-10-01T00:00:00Z 159.375 -0.5 273#> 18952 1985-10-01T00:00:00Z 160.625 -0.5 272#> 18953 1985-10-01T00:00:00Z 161.875 -0.5 271#> 18954 1985-10-01T00:00:00Z 163.125 -0.5 272#> 18955 1985-10-01T00:00:00Z 164.375 -0.5 275#> 18956 1985-10-01T00:00:00Z 165.625 -0.5 271#> 18957 1985-10-01T00:00:00Z 166.875 -0.5 271#> 18958 1985-10-01T00:00:00Z 168.125 -0.5 273#> 18959 1985-10-01T00:00:00Z 169.375 -0.5 273#> 18960 1985-10-01T00:00:00Z 170.625 -0.5 271#> 18961 1985-10-01T00:00:00Z 171.875 -0.5 270#> 18962 1985-10-01T00:00:00Z 173.125 -0.5 268#> 18963 1985-10-01T00:00:00Z 174.375 -0.5 265
#> # A tibble: 18,963 x 4#> date longitude latitude ozone#> <dttm> <dbl> <dbl> <chr>#> 1 1985-10-01 00:00:00 -179. -73.5 302 #> 2 1985-10-01 00:00:00 -178. -73.5 302 #> 3 1985-10-01 00:00:00 -177. -73.5 302 #> 4 1985-10-01 00:00:00 -176. -73.5 302 #> 5 1985-10-01 00:00:00 -174. -73.5 304 #> 6 1985-10-01 00:00:00 -173. -73.5 304 #> 7 1985-10-01 00:00:00 -172. -73.5 304 #> 8 1985-10-01 00:00:00 -171. -73.5 304 #> 9 1985-10-01 00:00:00 -164. -73.5 287 #> 10 1985-10-01 00:00:00 -163. -73.5 287 #> # … with 18,953 more rows
Look at the nimbus
data.
What class (data type) is ozone
?
nimbus %>% pull(ozone) %>% class()
01:00
nimbus %>% pull(ozone) %>% class()#> [1] "character"nimbus %>% pull(ozone) %>% unique()#> [1] "302" "304" "287" "274" "264" "242" "211" "195" "197" "196" "198" "193"#> [13] "187" "190" "199" "194" "213" "218" "221" "229" "209" "186" "188" "191"#> [25] "189" "184" "180" "." "215" "312" "319" "320" "311" "300" "290" "267"#> [37] "226" "210" "200" "203" "201" "192" "204" "206" "208" "205" "223" "232"#> [49] "238" "243" "220" "202" "185" "219" "222" "216" "324" "336" "333" "323"#> [61] "308" "295" "244" "212" "237" "248" "239" "241" "250" "249" "252" "234"#> [73] "318" "313" "326" "335" "337" "316" "266" "207" "227" "251" "253" "257"#> [85] "261" "214" "228" "273" "285" "288" "291" "270" "254" "317" "325" "332"#> [97] "340" "344" "338" "297" "247" "217" "225" "231" "235" "236" "262" "260"#> [109] "265" "272" "278" "280" "279" "255" "245" "224" "181" "240" "269" "296"#> [121] "307" "315" "321" "306" "299" "298" "283" "327" "322" "328" "331" "310"#> [133] "275" "233" "258" "276" "281" "289" "330" "346" "305" "334" "359" "347"#> [145] "314" "301" "256" "263" "277" "284"#> [ reached getOption("max.print") -- omitted 82 entries ]
NA
valuesnimbus %>% filter(ozone == ".")#> # A tibble: 155 x 4#> date longitude latitude ozone#> <dttm> <dbl> <dbl> <chr>#> 1 1985-10-01 00:00:00 70.6 -73.5 . #> 2 1985-10-01 00:00:00 71.9 -73.5 . #> 3 1985-10-01 00:00:00 73.1 -73.5 . #> 4 1985-10-01 00:00:00 74.4 -73.5 . #> 5 1985-10-01 00:00:00 75.6 -73.5 . #> 6 1985-10-01 00:00:00 76.9 -73.5 . #> 7 1985-10-01 00:00:00 78.1 -73.5 . #> 8 1985-10-01 00:00:00 79.4 -73.5 . #> 9 1985-10-01 00:00:00 65.6 -72.5 . #> 10 1985-10-01 00:00:00 66.9 -72.5 . #> # … with 145 more rows
dat <- read_csv(here("path", "to", "file.csv"), na = ".")
Read in nimbus.csv
again.
Set values of "."
to NA
.
02:00
read_csv(here("materials", "data", "nimbus.csv"))#> #> ── Column specification ────────────────────────────────────────────────────────#> cols(#> date = col_datetime(format = ""),#> longitude = col_double(),#> latitude = col_double(),#> ozone = col_character()#> )#> # A tibble: 18,963 x 4#> date longitude latitude ozone#> <dttm> <dbl> <dbl> <chr>#> 1 1985-10-01 00:00:00 -179. -73.5 302 #> 2 1985-10-01 00:00:00 -178. -73.5 302 #> 3 1985-10-01 00:00:00 -177. -73.5 302 #> 4 1985-10-01 00:00:00 -176. -73.5 302 #> 5 1985-10-01 00:00:00 -174. -73.5 304 #> 6 1985-10-01 00:00:00 -173. -73.5 304 #> 7 1985-10-01 00:00:00 -172. -73.5 304 #> 8 1985-10-01 00:00:00 -171. -73.5 304 #> 9 1985-10-01 00:00:00 -164. -73.5 287 #> 10 1985-10-01 00:00:00 -163. -73.5 287 #> # … with 18,953 more rows
read_csv(here("materials", "data", "nimbus.csv"), na = ".")#> #> ── Column specification ────────────────────────────────────────────────────────#> cols(#> date = col_datetime(format = ""),#> longitude = col_double(),#> latitude = col_double(),#> ozone = col_double()#> )#> # A tibble: 18,963 x 4#> date longitude latitude ozone#> <dttm> <dbl> <dbl> <dbl>#> 1 1985-10-01 00:00:00 -179. -73.5 302#> 2 1985-10-01 00:00:00 -178. -73.5 302#> 3 1985-10-01 00:00:00 -177. -73.5 302#> 4 1985-10-01 00:00:00 -176. -73.5 302#> 5 1985-10-01 00:00:00 -174. -73.5 304#> 6 1985-10-01 00:00:00 -173. -73.5 304#> 7 1985-10-01 00:00:00 -172. -73.5 304#> 8 1985-10-01 00:00:00 -171. -73.5 304#> 9 1985-10-01 00:00:00 -164. -73.5 287#> 10 1985-10-01 00:00:00 -163. -73.5 287#> # … with 18,953 more rows
dat <- read_csv(here("path", "to", "file.csv"), na = ".",
col_types = cols(var_1 = col_number()))
function | data type |
---|---|
col_character() | characters |
col_date() | dates |
col_datetime() | POSIXct (date-time) |
col_double() | double (decimal number) |
col_factor() | factors |
col_guess() | let readr guess (default) |
col_integer() | integers |
col_logical() | logicals |
col_number() | numbers mixed with non-number characters |
col_numeric() | double or integer |
col_skip() | do not read |
col_time() | time |
Read in nimbus.csv
again.
Set values of "."
to NA
.
Specify ozone
as integer values.
02:00
read_csv(here("materials", "data", "nimbus.csv"), na = ".", col_types = cols(ozone = col_integer()))#> # A tibble: 18,963 x 4#> date longitude latitude ozone#> <dttm> <dbl> <dbl> <int>#> 1 1985-10-01 00:00:00 -179. -73.5 302#> 2 1985-10-01 00:00:00 -178. -73.5 302#> 3 1985-10-01 00:00:00 -177. -73.5 302#> 4 1985-10-01 00:00:00 -176. -73.5 302#> 5 1985-10-01 00:00:00 -174. -73.5 304#> 6 1985-10-01 00:00:00 -173. -73.5 304#> 7 1985-10-01 00:00:00 -172. -73.5 304#> 8 1985-10-01 00:00:00 -171. -73.5 304#> 9 1985-10-01 00:00:00 -164. -73.5 287#> 10 1985-10-01 00:00:00 -163. -73.5 287#> # … with 18,953 more rows
library(rnaturalearth)library(sf)world <- ne_countries(scale = "medium", returnclass = "sf")ortho <- "+proj=ortho +lat_0=-78 +lon_0=166 +x_0=0 +y_0=0 +a=6371000 +b=6371000 +units=m +no_defs"ggplot(data = nimbus) + geom_point(mapping = aes(x = longitude, y = latitude, color = ozone)) + geom_sf(data = world, fill = NA, color = "black") + scale_color_viridis_c(option = "viridis") + coord_sf(crs = ortho)
.xls
and .xlsx
)jsonlite -> json xml2 -> xml httr -> web APIs DBI -> databases
Tidy Data Science with the Tidyverse and Tidymodels is licensed under a Creative Commons Attribution 4.0 International License.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
s | Toggle scribble toolbox |
Esc | Back to slideshow |