name: inter-slide class: left, middle, inverse {{ content }} --- name: layout-general layout: true class: left, middle <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: red; } /* custom.css */ .plot-callout { width: 300px; bottom: 5%; right: 5%; position: absolute; padding: 0px; z-index: 100; } .plot-callout img { width: 100%; border: 1px solid #23373B; } </style>
--- class: middle, left, inverse # Analyse des Données : Introduction to visualization ### 2023-01-15 #### [Master I MIDS & MFA]() #### [Analyse Exploratoire de Données](http://stephane-v-boucheron.fr/courses/eda/) #### [Stéphane Boucheron](http://stephane-v-boucheron.fr) --- template: inter-slide ##
### [Grammar of Graphics illustrated](#gg) ### [Do It Yourself](#diy) ### [Books](#books) --- template: inter-slide ## Grammar of Graphics --- We will use the _Grammar of Graphics_ approach to visualization The expression _Grammar of Graphics_ was coined by Leiland Wilkinson to describe a principled approach to visualization in Data Analysis (EDA) A plot is organized around data (a table with rows (observations) and columns (variables)) A *plot* is a *graphical object* that can be built layer by layer Building a graphical object consists in chaining elementary operations The acclaimed TED presentation by [Hans Rosling](https://en.wikipedia.org/wiki/Hans_Rosling) illustrates the Grammar of Graphics approach --- ### Grammar of Graphics in Action
```r knitr::include_url("https://www.youtube.com/embed/jbkSRLYSojo") ``` <iframe src="https://www.youtube.com/embed/jbkSRLYSojo" width="504" height="400px" data-external="1"></iframe> ??? ### [
Gapminder](https://www.gapminder.org) [Gapminder](https://www.gapminder.org) is a Swedish foundation. > Gapminder fights devastating misconceptions about global development. Gapminder produces free teaching resources making the world understandable based on reliable statistics. Gapminder promotes a fact-based worldview everyone can understand. Hans Rosling was a founder of Gapminder --- The TED presentation is an example of attractive data visualization It raises a number of questions -- - Where do the data come from? - Are the data reliable? - How can we *gather* data? - How can we *tidy* data? - How can we analyze/explore data? - How can we present data (to a diverse audience)? --- template: inter-slide name: diy ## Do It Yourself with
--- We will reproduce the animated demonstration using - `ggplot2`: an implementation of *grammar of graphics* in
- `plotly`: a bridge between
and the javascript library `D3.js` -- - Using `plotly`, opting for `html` ouput, brings the possibility of interactivity and animation --- ###
Install and load packages .fl.w-50.pa2.f6[ In
The `gapminder` data can be loaded from *package* `gapminder` -
Packages can be *installed* from `CRAN` using `install.packages()` -
Once a package is *installed* on the hard drive, it has to be *loaded* in the current session using `require()` or `library()` -
Function `p_load` from package `pacman` installs the package if needed and loads it ] .fl.w-50.pa2[ ```r pacman::p_load("tidyverse") pacman::p_load("gapminder") pacman::p_load("glue") pacman::p_load("DT") ``` -
Loading `tidyverse` loads a collection of packages dedicated to table manipulations and graphics including `ggplot2`. - Loading [`tidyverse`]() also loads parts of `magrittr`, notably the _pipe_ `%>%` -
`4.xx` offers a new pipe `|>` ] ??? Insist on the difference between _installing_ and _loading_ a package - How do we get the list of installed packages? - How do we get the list of loaded packages? - Which objects are made available by a package? `pacman::p_load()` kills two birds with one stone --- ### Have a look at `gapminder` dataset .fl.w-30.pa2[ - `gapminder` is a table, like a spreadsheet table, or a relational database table - In
parlance, `gapminder` is a `data.frame` - It is also a `tibble` in modern `tidyverse` parlance -
`data.frame`-like structures show up in every corner of Data Science ] .fl.w-70.pa2[ ```r gapminder %>% glimpse(width=50) ``` ``` ## Rows: 1,704 ## Columns: 6 ## $ country <fct> "Afghanistan", "Afghanistan", … ## $ continent <fct> Asia, Asia, Asia, Asia, Asia, … ## $ year <int> 1952, 1957, 1962, 1967, 1972, … ## $ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020… ## $ pop <int> 8425333, 9240934, 10267083, 11… ## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, … ``` ] --- ### What's in a data frame/tibble? .fl.w-30.pa2[ - A table has a _schema_: a list of named _columns_, each with a given type - A table has a _content_: _rows_. Each row is a collection of items, corresponding to the columns - `glimpse()` allows to see the schema and the first rows - `head()` allows to see the first rows ] .fl.w-70.pa2.f6[ ```r gapminder %>% head() %>% knitr::kable(digits = 3) ``` <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> lifeExp </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> gdpPercap </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1952 </td> <td style="text-align:right;"> 28.801 </td> <td style="text-align:right;"> 8425333 </td> <td style="text-align:right;"> 779.445 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1957 </td> <td style="text-align:right;"> 30.332 </td> <td style="text-align:right;"> 9240934 </td> <td style="text-align:right;"> 820.853 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1962 </td> <td style="text-align:right;"> 31.997 </td> <td style="text-align:right;"> 10267083 </td> <td style="text-align:right;"> 853.101 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1967 </td> <td style="text-align:right;"> 34.020 </td> <td style="text-align:right;"> 11537966 </td> <td style="text-align:right;"> 836.197 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1972 </td> <td style="text-align:right;"> 36.088 </td> <td style="text-align:right;"> 13079460 </td> <td style="text-align:right;"> 739.981 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:right;"> 38.438 </td> <td style="text-align:right;"> 14880372 </td> <td style="text-align:right;"> 786.113 </td> </tr> </tbody> </table> ] --- ### Get a feeling of the dataset .fl.w-30.pa2.f6[ We pick two random rows for each continent ```r tmp <- gapminder %>% * group_by(continent) %>% * slice_sample(n=2) %>% knitr::kable(digits = 3) ``` ] .fl.w-70.pa2.f6[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> lifeExp </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> gdpPercap </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Congo, Rep. </td> <td style="text-align:left;"> Africa </td> <td style="text-align:right;"> 1982 </td> <td style="text-align:right;"> 56.695 </td> <td style="text-align:right;"> 1774735 </td> <td style="text-align:right;"> 4879.508 </td> </tr> <tr> <td style="text-align:left;"> Morocco </td> <td style="text-align:left;"> Africa </td> <td style="text-align:right;"> 2007 </td> <td style="text-align:right;"> 71.164 </td> <td style="text-align:right;"> 33757175 </td> <td style="text-align:right;"> 3820.175 </td> </tr> <tr> <td style="text-align:left;"> Mexico </td> <td style="text-align:left;"> Americas </td> <td style="text-align:right;"> 1952 </td> <td style="text-align:right;"> 50.789 </td> <td style="text-align:right;"> 30144317 </td> <td style="text-align:right;"> 3478.126 </td> </tr> <tr> <td style="text-align:left;"> Guatemala </td> <td style="text-align:left;"> Americas </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:right;"> 56.029 </td> <td style="text-align:right;"> 5703430 </td> <td style="text-align:right;"> 4879.993 </td> </tr> <tr> <td style="text-align:left;"> China </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:right;"> 63.967 </td> <td style="text-align:right;"> 943455000 </td> <td style="text-align:right;"> 741.237 </td> </tr> <tr> <td style="text-align:left;"> Hong Kong, China </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1957 </td> <td style="text-align:right;"> 64.750 </td> <td style="text-align:right;"> 2736300 </td> <td style="text-align:right;"> 3629.076 </td> </tr> <tr> <td style="text-align:left;"> Poland </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1962 </td> <td style="text-align:right;"> 67.640 </td> <td style="text-align:right;"> 30329617 </td> <td style="text-align:right;"> 5338.752 </td> </tr> <tr> <td style="text-align:left;"> Iceland </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1982 </td> <td style="text-align:right;"> 76.990 </td> <td style="text-align:right;"> 233997 </td> <td style="text-align:right;"> 23269.607 </td> </tr> <tr> <td style="text-align:left;"> Australia </td> <td style="text-align:left;"> Oceania </td> <td style="text-align:right;"> 1952 </td> <td style="text-align:right;"> 69.120 </td> <td style="text-align:right;"> 8691212 </td> <td style="text-align:right;"> 10039.596 </td> </tr> <tr> <td style="text-align:left;"> Australia </td> <td style="text-align:left;"> Oceania </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:right;"> 73.490 </td> <td style="text-align:right;"> 14074100 </td> <td style="text-align:right;"> 18334.198 </td> </tr> </tbody> </table> ] --- exclude: true ### A tidy table ??? What's make a table tidy? - No/Few redundancies (what does it mean) - Is the `gapminder` table redundant? - .... - Compare with Codd's principles from database theory --- exclude: true ### Gapminder tibble (extract) ```r tt <- gapminder %>% filter(year==2002) %>% mutate(pop = stringr::str_c(round(pop/1e6,1), 'M')) %>% DT::datatable(options=list(pageLength=8, dom = 't'), * rownames = FALSE) %>% * DT::formatCurrency('gdpPercap', digits=0) %>% * DT::formatRound('lifeExp', 1) ``` ??? - Formatting instruction should not be wired in the table fed to `DT` - Reorder columns - Order countries by decreasing population size - Right-align `pop` column - Make provision for caption --- ###
`\(\sigma\)`: Picking one year of data .fl.w-30.pa2[ ```r gapminder_2002 <- gapminder %>% * filter(year==2002) ``` ] .fl.w-70.pa2.f6[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> lifeExp </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> gdpPercap </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 42.129 </td> <td style="text-align:right;"> 25268405 </td> <td style="text-align:right;"> 726.734 </td> </tr> <tr> <td style="text-align:left;"> Albania </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 75.651 </td> <td style="text-align:right;"> 3508512 </td> <td style="text-align:right;"> 4604.212 </td> </tr> <tr> <td style="text-align:left;"> Algeria </td> <td style="text-align:left;"> Africa </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 70.994 </td> <td style="text-align:right;"> 31287142 </td> <td style="text-align:right;"> 5288.040 </td> </tr> <tr> <td style="text-align:left;"> Angola </td> <td style="text-align:left;"> Africa </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 41.003 </td> <td style="text-align:right;"> 10866106 </td> <td style="text-align:right;"> 2773.287 </td> </tr> <tr> <td style="text-align:left;"> Argentina </td> <td style="text-align:left;"> Americas </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 74.340 </td> <td style="text-align:right;"> 38331121 </td> <td style="text-align:right;"> 8797.641 </td> </tr> <tr> <td style="text-align:left;"> Australia </td> <td style="text-align:left;"> Oceania </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 80.370 </td> <td style="text-align:right;"> 19546792 </td> <td style="text-align:right;"> 30687.755 </td> </tr> <tr> <td style="text-align:left;"> Austria </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 78.980 </td> <td style="text-align:right;"> 8148312 </td> <td style="text-align:right;"> 32417.608 </td> </tr> <tr> <td style="text-align:left;"> Bahrain </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 74.795 </td> <td style="text-align:right;"> 656397 </td> <td style="text-align:right;"> 23403.559 </td> </tr> <tr> <td style="text-align:left;"> Bangladesh </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 62.013 </td> <td style="text-align:right;"> 135656790 </td> <td style="text-align:right;"> 1136.390 </td> </tr> <tr> <td style="text-align:left;"> Belgium </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 78.320 </td> <td style="text-align:right;"> 10311970 </td> <td style="text-align:right;"> 30485.884 </td> </tr> </tbody> </table> ] --- In the parlance of Relational Algebra, `filter` performs a _selection_ of rows [Package `dplyr` docs](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/filter) > find rows/cases where conditions are true --- exclude: true --- template: inter-slide ## Static plotting --- ### First attempt .fl.w-40.pa2[ - Define a plot with respect to `gapminder_2002` - Map variables `gdpPercap` and `lifeExp` to axes `x` and `y` - For each row, draw a point at coordinates defined by the mapping ```r p <- gapminder_2002 %>% * ggplot() + * aes(x=gdpPercap) + * aes(y=lifeExp) + * geom_point() p ``` ] -- .fl.w-60.pa2[ <img src="cm-1.2-EDA_files/figure-html/bad-out-1.png" width="504" /> ] --- We are building a graphical object (a `ggplot` object) around a data frame (`gapminder`) We supply _aesthetic mappings_ (`aes()`) that can be either global or bound to some _geometries_ (`geom_point()`) or _statistics_ The global aesthetic mapping defines which columns are - mapped to which axes, - possibly mapped to colours, linetypes, shapes, ... Geometries and Statistics describe the building blocks of graphics Aesthetic mappings are usually complemented by, possibly implicit, *scales* --- ### What's missing here? when comparing to the Gapminder demonstration, we can spot that - colors are missing - bubble sizes are all the same - titles and legends are missing We will add *layers* to the graphical object to complete the plot --- ### Display more information .fl.w-30.pa2[ - Map `continent` to color - Map `pop` to bubble size - Make point transparent by tuning `alpha` (avoid *overplotting*) ```r p <- p + * aes(size = pop, * color = continent * ) + * geom_point(alpha=.75) p ``` ] .fl.w-70.pa2[ <img src="cm-1.2-EDA_files/figure-html/better-out-1.png" width="504" /> ] --- ### Scaling In order to pay tribute to Hans Rosling, we need to take care of two _scaling_ issues: - the gdp per capita axis should be _logarithmic_ - the _area_ of the point should be proportional to the population --- ### Fixing scales .fl.w-40.pa2.f6[ ```r p <- p + *scale_x_log10( * limits=c(1e2,5e4) * ) + *scale_size_area() p ``` ] .fl.w-60.pa2[ <img src="cm-1.2-EDA_files/figure-html/scaling-out-1.png" width="504" /> ] ??? ```r zoom_continent <- 'Europe' ``` --- ###
- Why is it important to use logarithmic scaling for gdp per capita? - When is it important to use logarithmic scaling on some axis (in other contexts)? - Why is it important to specify `scale_size_area()` ? --- ### In perspective .fl.w-30.pa2[ - Add a plot title - Make axes titles + explicit + readable ] .fl.w-70.pa2.f6[ ```r p_titre <- glue("Gapminder {min(p$data$year)}-{max(p$data$year)}") p_caption <- "From sick and poor (bottom left) to healthy and rich (top right)" p <- p + * labs(title=p_titre, * x = "Yearly income per capita (US$)", * y = "Life expectancy", * caption=p_caption ) p ``` ] --- ### In perspective (cont'd) .fl.w-30.pa2[ - Add a plot title - Make axes titles + explicit + readable ] .fl.w-70.pa2[ <img src="cm-1.2-EDA_files/figure-html/title-out-1.png" width="504" /> ] --- ###
What should be the purposes of - Title - Subtitle - Caption --- ### Theming using `ggthemes` .fl.w-40.pa2[ - Theming ```r require("ggthemes") p + theme_wsj() ``` A theme defines the _look and feel_ of plots Within a single document, we should use only one theme See [Getting the theme](https://ggplot2.tidyverse.org/reference/theme_get.html) for a gallery of available themes ] .fl.w-60.pa2[ ``` ## Loading required package: ggthemes ``` <img src="cm-1.2-EDA_files/figure-html/theme_economist-out-1.png" width="504" /> ] ??? - - --- ### Tuning scales .fl.w-60.pa2[ ```r neat_color_scale <- c("Africa" = "#01d4e5", "Americas" = "#7dea01" , "Asia" = "#fc5173", "Europe" = "#fde803", "Oceania" = "#536227") p <- p + * scale_size_area(max_size = 15) + * scale_color_manual(values = neat_color_scale) p ``` Choosing a color scale is a difficult task `viridis` is often a good pick ] .fl.w-40.pa2[ <img src="cm-1.2-EDA_files/figure-html/theme_scale-out-1.png" width="504" /> ] --- ### Zooming on a continent .fl.w-40.pa2[ ```r *require(ggforce) z_in <- 'Africa' p_africa <- p + * facet_zoom( * xy= continent==z_in, * zoom.data=continent==z_in * ) p_africa ``` ] .fl.w-60.pa2[ <img src="cm-1.2-EDA_files/figure-html/zoom_africa-out-1.png" width="504" /> ] ??? Look for journal themes/ WSJ themes Wes Anderson themes --- ### A static plot for Gapminder (2002) <img src="cm-1.2-EDA_files/figure-html/theme_scale-out-bis-1.png" width="90%" /> --- name: recap template: inter-slide ## Recap ??? --- ### Building a graphical object layer by layer ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_color_manual(values = neat_color_scale) + theme_minimal() + theme(legend.position = "none") + labs(title= glue("Gapminder {min(gapminder$year)}-{max(gapminder$year)}"), x = "Yearly Income per Capita", y = "Life Expectancy", caption="From sick and poor (bottom left) to healthy and rich (top right)") ``` ??? --- ### In action count: false .panel1-plot_gap_minder_2002-auto[ ```r *gapminder ``` ] .panel2-plot_gap_minder_2002-auto[ ``` ## # A tibble: 1,704 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 1,694 more rows ``` ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% * filter(year == 2002) ``` ] .panel2-plot_gap_minder_2002-auto[ ``` ## # A tibble: 142 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 2002 42.1 25268405 727. ## 2 Albania Europe 2002 75.7 3508512 4604. ## 3 Algeria Africa 2002 71.0 31287142 5288. ## 4 Angola Africa 2002 41.0 10866106 2773. ## 5 Argentina Americas 2002 74.3 38331121 8798. ## 6 Australia Oceania 2002 80.4 19546792 30688. ## 7 Austria Europe 2002 79.0 8148312 32418. ## 8 Bahrain Asia 2002 74.8 656397 23404. ## 9 Bangladesh Asia 2002 62.0 135656790 1136. ## 10 Belgium Europe 2002 78.3 10311970 30486. ## # … with 132 more rows ``` ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% * ggplot() ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_03_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + * aes(x = gdpPercap) ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_04_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + * aes(y = lifeExp) ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_05_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + * aes(size = pop) ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_06_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + * aes(color = continent) ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_07_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + * geom_point(alpha=.5) ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_08_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + * scale_x_log10() ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_09_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + scale_x_log10() + * scale_size_area(max_size = 15, * labels= scales::label_number(scale=1/1e6, * suffix=" M")) ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_10_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + * scale_color_manual(values = neat_color_scale) ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_11_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_color_manual(values = neat_color_scale) + * theme_minimal() ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_12_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_color_manual(values = neat_color_scale) + theme_minimal() + * theme(legend.position = "none") ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_13_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_color_manual(values = neat_color_scale) + theme_minimal() + theme(legend.position = "none") + * labs(title= glue("Gapminder {min(gapminder$year)}-{max(gapminder$year)}"), * x = "Yearly Income per Capita", * y = "Life Expectancy", * caption="From sick and poor (bottom left) to healthy and rich (top right)") ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_14_output-1.png" width="504" /> ] --- count: false .panel1-plot_gap_minder_2002-auto[ ```r gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + geom_point(alpha=.5) + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_color_manual(values = neat_color_scale) + theme_minimal() + theme(legend.position = "none") + labs(title= glue("Gapminder {min(gapminder$year)}-{max(gapminder$year)}"), x = "Yearly Income per Capita", y = "Life Expectancy", caption="From sick and poor (bottom left) to healthy and rich (top right)") ``` ] .panel2-plot_gap_minder_2002-auto[ <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_auto_15_output-1.png" width="504" /> ] <style> .panel1-plot_gap_minder_2002-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-plot_gap_minder_2002-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-plot_gap_minder_2002-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> ??? --- ### Adding labels ```r *require(ggrepel) gapminder %>% filter(year == 2002) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + aes(color = continent) + * aes(label = country) + geom_point(alpha=.5) + * ggrepel::geom_label_repel(max.overlaps = 15) + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_color_manual(values = neat_color_scale) + theme_minimal() + theme(legend.position = "none") + labs(title= glue("Gapminder {min(gapminder$year)}-{max(gapminder$year)}"), x = "Yearly Income per Capita", y = "Life Expectancy", caption="From sick and poor (bottom left) to healthy and rich (top right)") ``` --- <img src="cm-1.2-EDA_files/figure-html/plot_gap_minder_2002_labelled-out-1.png" width="504" /> --- template: inter-slide ## Facetting --- So far we have only presented one year of data (2002) Rosling used an *animation* to display the flow of time If we have to deliver a printable report, we cannot rely on animation, but we can rely on *facetting* Facets are collections of small plots constructed in the same way on subsets of the data We add a layer to the graphical object using `facet_wrap()` As all rows in `gapminder_2002` are all related to `year` 2002 --- ```r p <- p + scale_x_log10(limits=c(100, 50000)) + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_color_manual(values = neat_color_scale) + guides(color = guide_legend(title = "Continent", override.aes = list(size = 5), order = 1), size = guide_legend(title = "Population", order = 2)) + theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1)) + * facet_wrap(vars(year), ncol=6) p ``` --- <img src="cm-1.2-EDA_files/figure-html/facet-out-1.png" width="80%" /> --- We need to rebuild the graphical object along the same lines (using the same graphical pipeline) but starting from the whole `gapminder` dataset Should we do this using _cut and paste_? --- ### Don't Repeat Yoursel (DRY) .f6[ > Abide to the DRY principle using operator `%+%`: the `ggplot2` object `p` can be fed with another dataframe and all you need is proper facetting. ] ```r *p <- (p %+% gapminder) p <- p + labs(title=glue("Gapminder {min(p$data$year)}-{max(p$data$year)}")) p ``` <img src="cm-1.2-EDA_files/figure-html/dryit-1.png" width="504" /> ??? --- <img src="cm-1.2-EDA_files/figure-html/dryit-out-1.png" width="80%" /> --- template: inter-slide ## Animate for free with plotly --- ### From
to `D3.js` ```r q <- filter(gapminder, FALSE) %>% ggplot() + aes(x = gdpPercap) + aes(y = lifeExp) + aes(size = pop) + * aes(text = country) + aes(fill = continent) + * aes(frame = year) + geom_point(alpha=.5, colour='black') + scale_x_log10() + scale_size_area(max_size = 15, labels= scales::label_number(scale=1/1e6, suffix=" M")) + scale_fill_manual(values = neat_color_scale) + theme(legend.position = "none") + labs(title= glue("Gapminder {min(gapminder$year)}-{max(gapminder$year)}"), x = "Yearly Income per Capita", y = "Life Expectancy", caption="From sick and poor (bottom left) to healthy and rich (top right)") (q %+% gapminder) %>% * plotly::ggplotly(height = 500, width=750) ``` --- ```r (p + facet_null() + aes(frame=year)) %>% plotly::ggplotly(height = 500, width=750) ```
??? ---
--- ### Zooming on Europe (tweaking code) ```r target_continent <- 'Europe' ( q %+% * filter(gapminder, continent==target_continent) + * ggtitle(glue("Gapminder on {target_continent}")) ) %>% plotly::ggplotly(height = 500, width=750) ``` --- ### Zooming on Europe
--- template: inter-slide ```r knitr::include_url("https://plotly.com/r/") ``` <iframe src="https://plotly.com/r/" width="504" height="400px" data-external="1"></iframe> --- template: inter-slide ```r knitr::include_url("https://www.youtube.com/embed/3n9nASHg9gc") ``` <iframe src="https://www.youtube.com/embed/3n9nASHg9gc" width="504" height="400px" data-external="1"></iframe> Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). --- template: inter-slide ```r knitr::include_url("https://www.youtube.com/embed/RPFh3y9UAX4") ``` <iframe src="https://www.youtube.com/embed/RPFh3y9UAX4" width="504" height="400px" data-external="1"></iframe> --- exclude: true - `ggplotly` - `highcharter` ??? [raukr](https://nbisweden.github.io/raukrtemplate/presentation_demo.html#1) --- exclude: true ### Under the hood `htmltools` `htmlwidgets` ??? - `facet_zoom()` on a continent (from `ggforce`) - Animation is not interaction! - `shiny/flex_dashboard` - --- class: middle, center, inverse background-image: url('./img/pexels-cottonbro-3171837.jpg') background-size: cover # The End