class: title-slide, center <span class="fa-stack fa-4x"> <i class="fa fa-circle fa-stack-2x" style="color: #ffffff;"></i> <strong class="fa-stack-1x" style="color:#009FB7;">2</strong> </span> # Data Visualization ## Tidy Data Science with the Tidyverse and Tidymodels ### W. Jake Thompson #### [https://tidyds-2021.wjakethompson.com](https://tidyds-2021.wjakethompson.com) · [https://bit.ly/tidyds-2021](https://bit.ly/tidyds-2021) .footer-license[*Tidy Data Science with the Tidyverse and Tidymodels* is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).] <div style = "position:fixed; visibility: hidden"> `$$\require{color}\definecolor{yellow}{rgb}{0.996078431372549, 0.843137254901961, 0.4}$$` `$$\require{color}\definecolor{blue}{rgb}{0, 0.623529411764706, 0.717647058823529}$$` </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { yellow: ["{\\color{yellow}{#1}}", 1], blue: ["{\\color{blue}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .yellow {color: #FED766;} .blue {color: #009FB7;} </style> --- class: center middle .pull-left[ <img style="border-radius: 50%;" src="https://github.com/LucyMcGowan.png" width="300px"/> [Lucy D'Agostino McGowan](https://www.lucymcgowan.com/) Assistant Professor in Statistics Wake Forest University <i class="fab fa-twitter"></i> [@LucyStats](https://twitter.com/LucyStats) <i class="fab fa-github"></i> [@LucyMcGowan](https://github.com/LucyMcGowan) ] .pull-right[ <a href="https://leanpub.com/ggplot2in2"> <img src="images/books/ggplot2-in-2.png" width="375px"> </a> ] --- <div class="hex-book"> <a href="https://ggplot2.tidyverse.org"> <img class="hex" src="images/hex/ggplot2.png"> </a> <a href="https://r4ds.had.co.nz/data-visualisation.html"> <img class="book" src="images/books/r4ds-data-viz.png"> </a> </div> --- background-image: url(images/visualize/applied-ds-viz.png) background-position: center 60% background-size: 85% # .nobold[(Applied)] Data Science --- class: center .pull-left[ <img src="images/visualize/bbc-midterm.png" width="500px"> [BBC News](https://www.bbc.com/news/world-us-canada-46076389) ] .pull-right[ <img src="images/visualize/538-cancer.png" width="460px" style="border:1px solid #272727"> [FiveThirtyEight](https://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/) ] --- class: center .pull-left[ <img src="images/visualize/healy-network.png" width="500px" style="border:1px solid #272727;margin-bottom:3.1rem"> [Kieran Healy](https://kieranhealy.org/blog/archives/2013/06/18/a-co-citation-network-for-philosophy/) ] .pull-right[ <img src="images/visualize/mackintosh-weather.png" width="500px" style="border:1px solid #272727;margin-bottom:7.87rem"> [John MacKintosh](https://www.r-graph-gallery.com/283-the-hourly-heatmap) ] --- # Grammar of Graphics .pull-left.center[ <img src="images/books/grammar-graphics.png" width="275px"> ] .pull-right.center[ <img class="img-border" src="images/visualize/layered-grammar.png" width="450px"> https://doi.org/10.1198/jcgs.2009.07098 ] --- # ggplot2 .center[ <img src="images/books/ggplot2-2nd.png" width="275px"> [3rd Edition WIP](https://ggplot2-book.org/) ] --- class: center middle .big[**data** maps to] -- .big[**aes**thetics in] -- .big[layers] --- class: center middle background-image: url(images/visualize/ggbuild-1.png) background-position: center background-size: 85% --- class: center middle background-image: url(images/visualize/ggbuild-2.png) background-position: center background-size: 85% --- class: center middle background-image: url(images/visualize/ggbuild-3.png) background-position: center background-size: 85% --- class: center middle background-image: url(images/visualize/ggbuild-4.png) background-position: center background-size: 85% --- class: center middle background-image: url(images/visualize/build-comp/build5.png) background-position: center background-size: 40% --- class: center middle background-image: url(images/visualize/build-comp/build6.png) background-position: center background-size: 40% --- class: center middle background-image: url(images/visualize/build-comp/build7.png) background-position: center background-size: 40% --- # Example: Bechdel Test .big[ 1. At least 2 names women in the movie, 2. Women have at least one conversation with each other, and 3. That conversation isn't about a man ] --- # Bechdel Test Outcomes * Binary outcome (Pass/Fail) * Does the movie meet all 3 criteria <br> <br> * 5 outcomes based on severity * <span style="color:#EA594E">**nowomen**</span> - Fails first criteria * <span style="color:#EA594E">**notalk**</span> - Fails second criteria * <span style="color:#EA594E">**men**</span> - Fails third criteria * <span style="color:#589ACF">**dubious**</span> - Experts divided on whether third criteria is met * <span style="color:#589ACF">**ok**</span> - Meets all criteria --- # bechdel .smallish[ ```r library(fivethirtyeight) bechdel ## # A tibble: 1,794 x 15 ## year imdb title test clean_test binary budget domgross intgross code ## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr> ## 1 2013 tt171… 21 & Ov… notalk notalk FAIL 1.3 e7 25682380 4.22e7 2013… ## 2 2012 tt134… Dredd 3D ok-di… ok PASS 4.5 e7 13414714 4.09e7 2012… ## 3 2013 tt202… 12 Year… notal… notalk FAIL 2 e7 53107035 1.59e8 2013… ## 4 2013 tt127… 2 Guns notalk notalk FAIL 6.1 e7 75612460 1.32e8 2013… ## 5 2013 tt045… 42 men men FAIL 4 e7 95020213 9.50e7 2013… ## 6 2013 tt133… 47 Ronin men men FAIL 2.25e8 38362475 1.46e8 2013… ## 7 2013 tt160… A Good … notalk notalk FAIL 9.2 e7 67349198 3.04e8 2013… ## 8 2013 tt219… About T… ok-di… ok PASS 1.2 e7 15323921 8.73e7 2013… ## 9 2013 tt181… Admissi… ok ok PASS 1.3 e7 18007317 1.80e7 2013… ## 10 2013 tt181… After E… notalk notalk FAIL 1.3 e8 60522097 2.44e8 2013… ## # … with 1,784 more rows, and 5 more variables: budget_2013 <int>, ## # domgross_2013 <dbl>, intgross_2013 <dbl>, period_code <int>, ## # decade_code <int> ``` ] ??? Variables for us: * clean_test * budget * domgross * decade_code (1990s = 3, 2000s = 2, 2010s = 1) --- class: your-turn # Your turn 1 .big[ * Open the R Notebook **materials/exercises/02-visualize.Rmd** * Let's look at the `bechdel` data set * Run this code to make a graph ] ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = budget, y = domgross)) ```
02
:
00
--- class: your-turn .panelset[ .panel[.panel-name[Code] ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = budget, y = domgross)) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/bechdel-sol-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- class: middle <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = aes(x = budget, y = domgross))</code> --- class: middle <code class ='r hljs remark-code'>ggplot(data = <span style="background-color:#FED766;color:#009FB7">bechdel</span>) +<br> geom_point(mapping = aes(x = budget, y = domgross))</code> ??? Define the data --- class: middle <code class ='r hljs remark-code'>ggplot(data = bechdel) <span style="background-color:#FED766;color:#009FB7">+</span><br> geom_point(mapping = aes(x = budget, y = domgross))</code> ??? + before new line; *add* layers together --- class: middle <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> <span style="background-color:#FED766;color:#009FB7">geom_point</span>(mapping = aes(x = budget, y = domgross))</code> ??? the type of layer --- class: middle <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = <span style="background-color:#FED766;color:#009FB7">aes</span>(x = budget, y = domgross))</code> ??? define aesthetics for how data *maps* to layer --- class: middle <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = aes(<span style="background-color:#FED766;color:#009FB7">x = budget</span>, y = domgross))</code> ??? x variable --- class: middle <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = aes(x = budget, <span style="background-color:#FED766;color:#009FB7">y = domgross</span>))</code> ??? y variable --- name: aes class: center middle # .blue[**aes**.nobold[thetics]] --- class: center middle <img src="images/visualize/plots/aes-example-1.png" width="95%" style="display: block; margin: auto;" /> --- class: center <img src="images/visualize/plots/bechdel-aes-1.png" width="70%" style="display: block; margin: auto;" /> <br> <br> .left[ <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = aes(x = budget, y = domgross, <span style="background-color:#FED766;color:#009FB7">color = clean_test</span>))</code> ] --- .panelset[ .panel[.panel-name[Code] ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = budget, y = domgross, color = clean_test)) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/bechdel-color-plot-1.png" width="80%" style="display: block; margin: auto;" /> ] ] ??? Legend is added automatically --- class: your-turn # Your turn 2 .big[ * Experiment with adding color, size, alpha, and shape aesthetics to your graph * How do aesthetics behave differently with mapped to discrete and continuous variables? * What happens when you use more than one aesthetic? ]
05
:
00
--- <img src="images/visualize/plots/aes-overview-1.png" width="90%" style="display: block; margin: auto;" /> --- # Set vs. map <img src="images/visualize/plots/set-map-1.png" width="75%" style="display: block; margin: auto;" /> ??? How would you do this? --- .pull-left[ ![](images/visualize/plots/bechdel-color-plot-1.png) ] .pull-right.center[ **Inside of aes()**: Maps an aesthetic to a variable ] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = aes(x = budget, y = domgross, <span style="background-color:#FED766;color:#009FB7">color = clean_test</span>))</code> --- .pull-left[ ![](images/visualize/plots/set-map-1.png) ] .pull-right.center[ **Outside of aes()**: Sets an aesthetic to a value ] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = aes(x = budget, y = domgross, <span style="background-color:#FED766;color:#009FB7">color = clean_test</span>))</code> <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = <span style="background-color:#FED766;color:#009FB7">aes(x = budget, y = domgross)</span>, color = "blue")</code> --- .pull-left[ ![](images/visualize/plots/bechdel-color-plot-1.png) ] .pull-right.center[ ![](images/visualize/plots/set-map-1.png) ] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = aes(x = budget, y = domgross, <span style="background-color:#FED766;color:#009FB7">color = clean_test</span>))</code> <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(mapping = <span style="background-color:#FED766;color:#009FB7">aes(x = budget, y = domgross)</span>, color = "blue")</code> --- class: pop-quiz # Pop quiz! What will this code do? ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = budget, y = domgross, color = "blue")) ``` --- class: pop-quiz .panelset[ .panel[.panel-name[Code] ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = budget, y = domgross, color = "blue")) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/pop-quiz-solution-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- name: geom class: center middle # .blue[**geom**.nobold[etries]] --- class: center middle .pull-left[ <img src="images/visualize/plots/point-geom-1.png" width="504" style="display: block; margin: auto;" /> ] .pull-right[ <img src="images/visualize/plots/smooth-geom-1.png" width="504" style="display: block; margin: auto;" /> ] ??? How are these plots similar? * x var, y var, data How different? * Geometric object (i.e., visual object used to represent the data) --- class: middle <code class ='r hljs remark-code'>ggplot(data = <span style="color:#009FB7;background-color:#FED766">{DATA}</span>) +<br> <span style="color:#009FB7;background-color:#FED766">{GEOM_FUNCTION}</span>(mapping = aes(<span style="color:#009FB7;background-color:#FED766">{MAPPINGS}</span>))</code> --- background-image: url(images/visualize/cheatsheet-geom.png) background-position: center middle background-size: 85% --- class: your-turn # Your turn 3 .big[ * Replace this scatter plot with one that draws a box plot ] ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = clean_test, y = budget)) ``` <img src="images/visualize/plots/yt-scatter-1.png" width="40%" style="display: block; margin: auto;" />
05
:
00
--- class: your-turn .panelset[ .panel[.panel-name[Code] ```r ggplot(data = bechdel) + geom_boxplot(mapping = aes(x = clean_test, y = budget)) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/yt-box-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- class: your-turn # Your turn 4 .big[ * Make the histogram of **budget** shown below ] <img src="images/visualize/plots/yt-hist-1.png" width="60%" style="display: block; margin: auto;" />
05
:
00
--- class: your-turn .panelset[ .panel[.panel-name[Code] ```r ggplot(data = bechdel) + geom_histogram(mapping = aes(x = budget)) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/yt-hist-sol-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- class: your-turn # Your turn 5 .big[ * Make the density plot of **budget** colored by **clean_test** shown below ] <img src="images/visualize/plots/yt-col-den-1.png" width="60%" style="display: block; margin: auto;" />
05
:
00
--- class: your-turn .panelset[ .panel[.panel-name[Density Code] ```r ggplot(data = bechdel) + geom_density(mapping = aes(x = budget)) ``` ] .panel[.panel-name[Density Plot] <img src="images/visualize/plots/dcode-sol-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Coloring Code] ```r ggplot(data = bechdel) + geom_density(mapping = aes(x = budget, color = clean_test)) ``` ] .panel[.panel-name[Final Plot] <img src="images/visualize/plots/col-den-sol-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- class: your-turn # Your turn 6 .big[ * Make the bar chart of **clean_test** colored by **clean_test** shown below ] <img src="images/visualize/plots/yt-col-bar-1.png" width="60%" style="display: block; margin: auto;" />
05
:
00
--- class: your-turn .panelset[ .panel[.panel-name[Color Code] ```r ggplot(data = bechdel) + geom_bar(mapping = aes(x = clean_test, color = clean_test)) ``` ] .panel[.panel-name[Color Plot] <img src="images/visualize/plots/col-bar-sol-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Fill Code] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_bar(mapping = aes(x = clean_test, <span style="background-color:#FED766;color:#009FB7">fill</span> = clean_test))</code> ] .panel[.panel-name[Final Plot] <img src="images/visualize/plots/fill-bar-sol-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- class: your-turn # Your turn 7 .big[ * Predict what this code will do. * Then run it. ] ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = budget, y = domgross)) + geom_smooth(mapping = aes(x = budget, y = domgross)) ```
05
:
00
--- class: your-turn .panelset[ .panel[.panel-name[Code] ```r ggplot(data = bechdel) + geom_point(mapping = aes(x = budget, y = domgross)) + geom_smooth(mapping = aes(x = budget, y = domgross)) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/yt-mult-layer-sol-1.png" width="80%" style="display: block; margin: auto;" /> ] ] ??? Each new geom just adds a new layer. --- name: global class: center middle # .blue[global vs. local] --- <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_point(<span style="background-color:#FED766;color:#009FB7">mapping = aes(x = budget, y = domgross)</span>) +<br> geom_smooth(<span style="background-color:#FED766;color:#009FB7">mapping = aes(x = budget, y = domgross)</span>)</code> <img src="images/visualize/plots/yt-mult-layer-sol-1.png" width="70%" style="display: block; margin: auto;" /> ??? This is really repetitive. What if you want to change one of the variables? Then you have to change every layer. Gross. --- <code class ='r hljs remark-code'>ggplot(data = bechdel, <span style="background-color:#FED766;color:#009FB7">mapping = aes(x = budget, y = domgross)</span>) +<br> geom_point() +<br> geom_smooth()</code> <img src="images/visualize/plots/global-opts-1.png" width="70%" style="display: block; margin: auto;" /> ??? Mappings and data that are defined in `ggplot()` will be applied globally, to every layer. --- <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +<br> geom_point(<span style="background-color:#FED766;color:#009FB7">mapping = aes(color = clean_test)</span>) +<br> geom_smooth()</code> <img src="images/visualize/plots/local-opts-1.png" width="70%" style="display: block; margin: auto;" /> ??? Mappings (and data) that appear in a geom_*() function will add to or override the global mappings for that layer only --- <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +<br> geom_point(mapping = aes(color = clean_test)) +<br> geom_smooth(<span style="background-color:#FED766;color:#009FB7">data = filter(bechdel, clean_test == "ok")</span>)</code> <img src="images/visualize/plots/local-data-1.png" width="70%" style="display: block; margin: auto;" /> ??? Data can also be set globally and/or locally --- name: stats class: center middle # .blue[stats] --- class: pop-quiz # Pop quiz! Where do the values for creating the box and whiskers come from? <img src="images/visualize/plots/yt-box-1.png" width="60%" style="display: block; margin: auto;" /> --- class: pop-quiz What does `method = "gam"` mean? ```r ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) + geom_point() + geom_smooth() ## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")' ``` <img src="images/visualize/plots/show-gam-msg-1.png" width="60%" style="display: block; margin: auto;" /> --- # Defining Stats .panelset[ .panel[.panel-name[`stat_summary`] ```r ggplot(data = bechdel, mapping = aes(x = clean_test, y = budget)) + geom_boxplot() + stat_summary(geom = "point", fun = "mean", color = "blue", size = 3) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/stat-summary-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[`stat = "summary"`] ```r ggplot(data = bechdel, mapping = aes(x = clean_test, y = budget)) + geom_boxplot() + geom_point(stat = "summary", fun = "mean", color = "blue", size = 3) ``` ] .panel[.panel-name[Plot] <img src="images/visualize/plots/geom-func-1.png" width="60%" style="display: block; margin: auto;" /> ] ] --- # Stats as Layers <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = clean_test, y = budget)) +<br> geom_boxplot() +<br> <span style="background-color:#FED766;color:#009FB7">stat_summary</span>(<span style="background-color:#FED766;color:#009FB7">geom = "point"</span>, fun = "mean", color = "blue", size = 3)</code> <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = clean_test, y = budget)) +<br> geom_boxplot() +<br> <span style="background-color:#FED766;color:#009FB7">geom_point</span>(<span style="background-color:#FED766;color:#009FB7">stat = "summary"</span>, fun = "mean", color = "blue", size = 3)</code> --- # Distributions ```r ggplot() + xlim(c(-5, 5)) + geom_function(aes(color = "normal"), fun = dnorm) + geom_function(aes(color = "t, df = 1"), fun = dt, args = list(df = 1)) ``` <img src="images/visualize/plots/dist-example-1.png" width="55%" style="display: block; margin: auto;" /> --- name: positions class: center middle # .blue[positions] --- .panelset[ .panel[.panel-name[Bad] <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = clean_test, y = budget)) +<br> geom_point()</code> <img src="images/visualize/plots/bad-pos-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Better] <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = clean_test, y = budget)) +<br> geom_point(<span style="background-color:#FED766;color:#009FB7">position = "jitter"</span>)</code> <img src="images/visualize/plots/better-pos-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Best] <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = clean_test, y = budget)) +<br> geom_point(position = <span style="background-color:#FED766;color:#009FB7">position_jitter(width = 0.2, height = 0)</span>)</code> <img src="images/visualize/plots/best-pos-1.png" width="60%" style="display: block; margin: auto;" /> ] ] --- class: center middle `position_identity()` -- `position_jitter()` -- `position_dodge()` -- `position_fill()` -- more... --- <img src="images/visualize/plots/need-pos-1.png" width="90%" style="display: block; margin: auto;" /> --- class: your-turn # Your turn 8 .big[ * Add a position adjustment to this plot to compare the frequency of test results across decades. ] <code class ='r hljs remark-code'>ggplot(bechdel, mapping = aes(x = decade_code)) +<br> geom_bar(mapping = aes(fill = clean_test))</code> <img src="images/visualize/plots/need-pos-1.png" width="40%" style="display: block; margin: auto;" />
05
:
00
--- class: your-turn .panelset[ .panel[.panel-name[Dodge] ```r ggplot(bechdel, mapping = aes(x = decade_code)) + geom_bar(mapping = aes(fill = clean_test), position = "dodge") ``` ] .panel[.panel-name[Dodge Plot] <img src="images/visualize/plots/bar-pos-dodge-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Fill] ```r ggplot(bechdel, mapping = aes(x = decade_code)) + geom_bar(mapping = aes(fill = clean_test), position = "fill") ``` ] .panel[.panel-name[Fill Plot] <img src="images/visualize/plots/bar-pos-fill-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- name: scales class: center middle # .blue[scales] --- <img src="images/visualize/plots/scale-brewer-1.png" width="90%" style="display: block; margin: auto;" /> --- .panelset[ .panel[.panel-name[Brewer] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_bar(mapping = aes(x = clean_test, fill = clean_test)) +<br> <span style="background-color:#ffff7f">scale_fill_brewer()</span></code> <img src="images/visualize/plots/scale-brewer-def-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Qualitative] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_bar(mapping = aes(x = clean_test, fill = clean_test)) +<br> scale_fill_brewer(<span style="background-color:#ffff7f">type = "qual"</span>)</code> <img src="images/visualize/plots/scale-brewer-qual-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Palettes] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_bar(mapping = aes(x = clean_test, fill = clean_test)) +<br> scale_fill_brewer(type = "qual", <span style="background-color:#ffff7f">palette = "Set1"</span>)</code> .left-column.center[ View available palettes at: [colorbrewer2.org](https://colorbrewer2.org) ] .right-column[ <img src="images/visualize/plots/scale-brewer-pal-1.png" width="80%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Ordinal] <code class ='r hljs remark-code'>ggplot(data = bechdel) +<br> geom_bar(mapping = aes(x = clean_test, fill = clean_test)) +<br> <span style="background-color:#ffff7f">scale_fill_ordinal()</span></code> <img src="images/visualize/plots/scale-ordinal-1.png" width="60%" style="display: block; margin: auto;" /> ] ] --- class: center middle `scale_{aesthetic}_continuous()` -- `scale_{aesthetic}_discrete()` -- `scale_{aesthetic}_ordinal()` -- `scale_{aesthetic}_manual()` -- `scale_{color/fill}_brewer()` -- `scale_{color/fill}_distill()` -- `scale_{color/fill}_gradient()` --- ```r ggplot(bechdel, aes(x = clean_test, y = budget)) + geom_point(position = position_jitter(width = 0.2, height = 0)) ``` <img src="images/visualize/plots/coord-def-1.png" width="70%" style="display: block; margin: auto;" /> ??? Also coordinate scales. --- ```r ggplot(bechdel, aes(x = clean_test, y = budget)) + geom_point(position = position_jitter(width = 0.2, height = 0)) + scale_x_discrete(limits = c("dubious", "ok", "men", "notalk", "nowomen")) ``` <img src="images/visualize/plots/coord-x-1.png" width="70%" style="display: block; margin: auto;" /> --- ```r ggplot(bechdel, mapping = aes(x = budget, y = domgross)) + geom_point() ``` <img src="images/visualize/plots/cont-def-1.png" width="70%" style="display: block; margin: auto;" /> --- ```r ggplot(bechdel, mapping = aes(x = budget, y = domgross)) + geom_point() + scale_x_continuous(limits = c(0, 5e+08), breaks = seq(0, 5e+08, 5e+07), labels = scales::scientific) ``` <img src="images/visualize/plots/cont-x-1.png" width="70%" style="display: block; margin: auto;" /> --- name: facets class: center middle # .blue[facets] --- <img src="images/visualize/plots/need-facet-1.png" width="80%" style="display: block; margin: auto;" /> --- <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +<br> geom_point(mapping = aes(color = clean_test)) +<br> <span style="background-color:#ffff7f">facet_wrap(vars(clean_test))</span></code> <img src="images/visualize/plots/facet-wrap-1.png" width="70%" style="display: block; margin: auto;" /> --- <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +<br> geom_point(mapping = aes(color = clean_test)) +<br> <span style="background-color:#ffff7f">facet_grid</span>(rows = vars(decade_code), cols = vars(clean_test))</code> <img src="images/visualize/plots/facet-grid-1.png" width="70%" style="display: block; margin: auto;" /> ??? Facet with multiple variables using `facet_grid()` --- name: themes class: center middle # .blue[themes] --- .panelset[ .panel[.panel-name[Default] ```r ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) + geom_point(mapping = aes(color = clean_test)) ``` <img src="images/visualize/plots/theme-grey-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Black & White] ```r ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) + geom_point(mapping = aes(color = clean_test)) + theme_bw() ``` <img src="images/visualize/plots/theme-bw-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Minimal] ```r ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) + geom_point(mapping = aes(color = clean_test)) + theme_minimal() ``` <img src="images/visualize/plots/theme-min-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Void] ```r ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) + geom_point(mapping = aes(color = clean_test)) + theme_void() ``` <img src="images/visualize/plots/theme-void-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[More...] ```r ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) + geom_point(mapping = aes(color = clean_test)) + theme(legend.position = "bottom") ``` <img src="images/visualize/plots/theme-custom-1.png" width="60%" style="display: block; margin: auto;" /> ] ] --- .panelset[ .panel[.panel-name[Stata Theme] ```r library(ggthemes) ``` <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +<br> geom_point(mapping = aes(color = clean_test)) +<br> <span style="background-color:#ffff7f">scale_color_stata</span>() +<br> <span style="background-color:#ffff7f">theme_stata</span>()</code> ] .panel[.panel-name[Plot] <img src="images/visualize/plots/stata-thm-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- name: extend class: center middle # .blue[extensions] --- # ggforce .pull-left.center[ Additional: geoms stats facets .big[ [ggforce.data-imaginist.com](https://ggforce.data-imaginist.com/) ] ] .pull-right.center[ <img src="images/hex/ggforce.png" width="70%" style="display: block; margin: auto;" /> ] --- # ggrepel .pull-left.center[ Add non-overlapping labels to plots .big[ [ggrepel.slowkow.com](https://ggrepel.slowkow.com/) ] ] .pull-right.center[ <img src="images/hex/ggrepel.png" width="70%" style="display: block; margin: auto;" /> ] --- # ggdist .pull-left.center[ Plot distributional summaries from, e.g., MCMC chains .big[ [mjskay.github.io/ggdist](https://mjskay.github.io/ggdist/) ] ] .pull-right.center[ <img src="images/hex/ggdist.png" width="70%" style="display: block; margin: auto;" /> ] --- # gganimate .pull-left.center[ Add motion and animation to ggplot2 .big[ [gganimate.com](https://gganimate.com/) ] ] .pull-right.center[ <img src="images/hex/gganimate.png" width="70%" style="display: block; margin: auto;" /> ] --- # ggraph .pull-left.center[ Plot relational data structures: networks graphs trees .big[ [ggraph.data-imaginist.com](https://ggraph.data-imaginist.com/) ] ] .pull-right.center[ <img src="images/hex/ggraph.png" width="70%" style="display: block; margin: auto;" /> ] --- class: center middle # .blue[finishing touches] --- # Saving plots .panelset[ .panel[.panel-name[Save last] ```r ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) + geom_point(mapping = aes(color = clean_test)) ggsave("my-plot.png", width = 8, height = 6, units = "in") ``` * Saves last generated plot * To current working directory ] .panel[.panel-name[Save specific] <code class ='r hljs remark-code'>ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +<br> geom_point(mapping = aes(color = clean_test)) <span style="background-color:#ffff7f">-> scatter_plot</span><br><br>ggplot(data = bechdel, mapping = aes(x = decade_code)) +<br> geom_bar(mapping = aes(fill = clean_test), position = "fill") -> bar_plot<br><br>ggsave("my-plot.png", <span style="background-color:#ffff7f">plot = scatter_plot</span>, <span style="background-color:#ffff7f">path = "~/Desktop/"</span>,<br> width = 8, height = 6, units = "in")</code> ] ] --- class: middle <code class ='r hljs remark-code'>ggplot(data = <span style="background-color:#FED766;color:#009FB7">{DATA}</span>, mapping = aes(<span style="background-color:#FED766;color:#009FB7">{GLOBAL MAPPINGS}</span>)) +<br> <span style="background-color:#FED766;color:#009FB7">{GEOM_FUNCTION}</span>(mapping = aes(<span style="background-color:#FED766;color:#009FB7">{LOCAL MAPPINGS}</span>),<br> stat = <span style="background-color:#009FB7;color:#FED766">{STAT}</span>, position = <span style="background-color:#009FB7;color:#FED766">{POSITION}</span>) +<br> <span style="background-color:#009FB7;color:#FED766">{FACET_FUNCTION}</span> +<br> <span style="background-color:#009FB7;color:#FED766">{SCALE_FUNCTION}</span> +<br> <span style="background-color:#009FB7;color:#FED766">{THEME_FUNCTION}</span><br><br>ggsave(...)</code> --- class: title-slide, center # Data Visualization <img src="images/hex/ggplot2.png" width="20%" style="display: block; margin: auto;" /> ## Tidy Data Science with the Tidyverse and Tidymodels ### W. Jake Thompson #### [https://tidyds-2021.wjakethompson.com](https://tidyds-2021.wjakethompson.com) · [https://bit.ly/tidyds-2021](https://bit.ly/tidyds-2021) .footer-license[*Tidy Data Science with the Tidyverse and Tidymodels* is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).]