Extract/filter a subset of rows using dplyr::filter(...)
Filtering (selection \(σ\) from database theory) : Picking one year of data
There is simple way to filter rows satisfying some condition. It consists in mimicking indexation in a matrix, leaving the colum index empty, replacing the row index by a condition statement (a logical expression) also called a mask.
Have a look at gapminder$year==2002. What is the type/class of this expression?
This is possible in base R and very often convenient.
Nevertheless, this way of performing row filtering does not emphasize the connection between the dataframe and the condition. Any logical vector with the right length could be used as a mask. Moreover, this way of performing filtering is not very functional.
In the parlance of Relational Algebra, filter performs a selection of rows. Relational expression \[σ_{\text{condition}}(\text{Table})\] translates to
Code
filter(Table, condition)
where \(\text{condition}\) is a boolean expression that can be evaluated on each row of \(\text{Table}\). In SQL, the relational expression would translate into
zoom_continent <-'Europe'# choose another continent at your convenience
Use facet_zoom() from package ggforce
Adding labels
Facetting
So far we have only presented one year of data (2002)
Rosling used an animation to display the flow of time
If we have to deliver a printable report, we cannot rely on animation, but we can rely on facetting
Facets are collections of small plots constructed in the same way on subsets of the data
We add a layer to the graphical object using facet_wrap()
As all rows in gapminder_2002 are all related to year 2002, we need to rebuild the graphical object along the same lines (using the same graphical pipeline) but starting from the whole gapminder dataset.
Should we do this using cut and paste?
No
Don’t Repeat Yoursel (DRY)
Abide to the DRY principle using operator %+%: the ggplot2 object p can be fed with another dataframe and all you need is proper facetting.