Code
if (!require(gssr)) {
if (!require(remotes)){
install.packages("remotes")
}
remotes::install_github("kjhealy/gssr")
}
if (!require(gssr)) {
if (!require(remotes)){
install.packages("remotes")
}
remotes::install_github("kjhealy/gssr")
}
gssr
We work again with General Social Survey (GSS) data.
We take advantage of R
package gssr
if (!require(gssr)) {
if (!require(remotes)){
install.packages("remotes")
}
remotes::install_github("kjhealy/gssr")
}
The GSS is carried out every two years. It offers both cross-sectional data and panel data.
Package gssr
offers a simple way to retrieve yearly data.
df_2018 <- gssr::gss_get_yr(2018)
Fetching: https://gss.norc.org/documents/stata/2018_stata.zip
age
and agekdbrn
The 2018
data provide (among too many other things) columns named age
abd agekdbrn
. Get numerical summaries about these two columns.
Thanks to gssr
, you can get meta-information about the columns
?aged
?agekdbrn
?sex
sex
encoded? Is it worth recoding it?age
distribution/facet by sex
age
distribution with population age
distributionknitr::include_url("https://perspective.usherbrooke.ca/bilan/servlet/BMPagePyramide/USA/2018/?", height=600)
Sherbrooke University offers visual information about the age structure of population of a wide range of countries.
Following demographic usage, the age structure is presented through an age pyramid.
Note that an age pyramid is a special kind of histogram
age
with respect to sex
age
and agekdbrn
, facet by sex
`gss_sub
data("gss_sub")
gss_sub |>
glimpse()
Rows: 72,390
Columns: 19
$ year <dbl+lbl> 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 197…
$ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ ballot <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ age <dbl+lbl> 23, 70, 48, 27, 61, 26, 28, 27, 21, 30, 30, 56, 54, 49, 4…
$ race <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, …
$ sex <dbl+lbl> 2, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 2, …
$ degree <dbl+lbl> 3, 0, 1, 3, 1, 1, 1, 3, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 3, …
$ padeg <dbl+lbl> 0, 0, 0, 3, 0, 3, 3, 3, …
$ madeg <dbl+lbl> NA(i), 0, 0, 1, 0, 4, 1, 1, …
$ relig <dbl+lbl> 3, 2, 1, 5, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ polviews <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ fefam <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ vpsu <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ vstrat <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ oversamp <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ formwt <dbl+lbl> NA(y), NA(y), NA(y), NA(y), NA(y), NA(y), NA(y), NA(y), N…
$ wtssall <dbl+lbl> 0.4446, 0.8893, 0.8893, 0.8893, 0.8893, 0.4446, 0.4446, 0…
$ sampcode <dbl+lbl> NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), NA(i), N…
$ sample <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
What kind of information do we get through variables degree
and padeg
?
?degree
?padeg
degree
and padeg
degree
and padeg
degree
and padeg