Code
if (!require(gssr)) {
if (!require(remotes)){
install.packages("remotes")
}
remotes::install_github("kjhealy/gssr")
}if (!require(gssr)) {
if (!require(remotes)){
install.packages("remotes")
}
remotes::install_github("kjhealy/gssr")
}gssr
We work again with General Social Survey (GSS) data.
We take advantage of R package gssr
if (!require(gssr)) {
if (!require(remotes)){
install.packages("remotes")
}
remotes::install_github("kjhealy/gssr")
}The GSS is carried out every two years. It offers both cross-sectional data and panel data.
Package gssr offers a simple way to retrieve yearly data.
df_2018 <- gssr::gss_get_yr(2018)Fetching: https://gss.norc.org/documents/stata/2018_stata.zip
age and agekdbrn
The 2018 data provide (among too many other things) columns named age abd agekdbrn. Get numerical summaries about these two columns.
Thanks to gssr, you can get meta-information about the columns
?aged
?agekdbrn
?sexsex encoded? Is it worth recoding it?age distribution/facet by sex
age distribution with population age distributionknitr::include_url("https://perspective.usherbrooke.ca/bilan/servlet/BMPagePyramide/USA/2018/?", height=600)Sherbrooke University offers visual information about the age structure of population of a wide range of countries.
Following demographic usage, the age structure is presented through an age pyramid.
Note that an age pyramid is a special kind of histogram
age with respect to sex
age and agekdbrn, facet by sex `gss_sub
data("gss_sub")
gss_sub |>
glimpse()Rows: 72,390
Columns: 19
$ year <dbl+lbl> 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 197…
$ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ ballot <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ age <dbl+lbl> 23, 70, 48, 27, 61, 26, 28, 27, 21, 30, 30, 56, 54, 49, 4…
$ race <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, …
$ sex <dbl+lbl> 2, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 2, …
$ degree <dbl+lbl> 3, 0, 1, 3, 1, 1, 1, 3, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 3, …
$ padeg <dbl+lbl> 0, 0, 0, 3, 0, 3, 3, 3, …
$ madeg <dbl+lbl> NA(i), 0, 0, 1, 0, 4, 1, 1, …
$ relig <dbl+lbl> 3, 2, 1, 5, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ polviews <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ fefam <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ vpsu <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ vstrat <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ oversamp <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ formwt <dbl+lbl> NA(y), NA(y), NA(y), NA(y), NA(y), NA(y), NA(y), NA(y), N…
$ wtssall <dbl+lbl> 0.4446, 0.8893, 0.8893, 0.8893, 0.8893, 0.4446, 0.4446, 0…
$ sampcode <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ sample <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
What kind of information do we get through variables degree and padeg?
?degree
?padegdegree and padeg
degree and padeg
degree and padeg