Published

March 14, 2024

Code
params = list(
  truc= "Science des Données",
  year= 2023 ,
  curriculum= "L3 MIASHS",
  university= "Université Paris Cité",
  homepage= "https://stephane-v-boucheron.fr/courses/scidon",
  moodle= "https://moodle.u-paris.fr/course/view.php?id=13227",
  country_code= 'fr_t',
  country= 'France',
  datafile= 'full_life_table.Rds',
  year_p= 1948,
  year_e= 2017
)
Code
require(patchwork)
require(glue)
require(here)
require(tidyverse)
require(plotly)
require(DT)
require(GGally)
require(ggforce)
require(ggfortify)

tidymodels::tidymodels_prefer(quiet = TRUE)

old_theme <-theme_set(theme_minimal(base_size=9, base_family = "Helvetica"))

Data sources

Life tables have been downloaded from https://www.mortality.org.

We investigate life tables describing countries from Western Europe (France, Great Britain –actually England and Wales–, Italy, the Netherlands, Spain, and Sweden) and the United States.

Life tables used here have been doctored and merged so as to simplify discussion.

We will use the next lookup tables to recode some factors.

Code
country_code <- list(fr_t='FRATNP',
                     fr_c='FRACNP',
                     be='BEL',
                     gb_t='GBRTENW',
                     gb_c='GBRCENW',
                     nl='NLD',
                     it='ITA',
                     swe='SWE',
                     sp='ESP',
                     us='USA')

countries <- c('fr_t', 'gb_t', 'nl', 'it', 'sp', 'swe', 'us')

country_names <- list(fr_t='France',     # total population
                     fr_c='France',      # civilian population
                     be='Belgium',
                     gb_t='England & Wales',    # total population
                     gb_c='England & Wales',    # civilian population
                     nl='Netherlands',
                     it='Italy',
                     swe='Sweden',
                     sp='Spain',
                     us='USA')

gender_names <- list('b'='Both',
                     'f'='Female',
                     'm'='Male')
Code
datafile <- 'full_life_table.Rds'
fpath <- stringr::str_c("./DATA/", datafile) # here::here('DATA', datafile)   # check getwd() if problem 

if (! file.exists(fpath)) {
  download.file("https://stephane-v-boucheron.fr/data/full_life_table.Rds", 
                fpath,
                mode="wb")
}

life_table <- readr::read_rds(fpath)
Code
life_table <- life_table %>%
  mutate(Country = as_factor(Country)) %>%
  mutate(Country = fct_relevel(Country, "Spain", "Italy", "France", "England & Wales", "Netherlands", "Sweden", "USA")) %>%
  mutate(Gender = as_factor(Gender)) 

life_table <- life_table %>%
  mutate(Area = fct_collapse(Country, 
                        SE = c("Spain", "Italy", "France"), 
                        NE = c("England & Wales", "Netherlands", "Sweden"), 
                        USA="USA")) 

Document Tables de mortalité françaises pour les XIXe et XXe siècles et projections pour le XXIe siècle contains detailed information on the construction of Life Tables for France.

Two kinds of Life Tables can be distinguished: Table du moment which contain for each calendar year, the mortality risks at different ages for that very year; and Tables de génération which contain for a given birthyear, the mortality risks at which an individual born during that year has been exposed.

The life tables investigated in this lab are Table du moment. According to the document by Vallin and Meslé, building the life tables required decisions and doctoring.

Have a look at Lexis diagram.

Definitions can be obtained from www.lifeexpectancy.org. We translate it into mathematical (rather than demographic) language. Recall that the quantities define a probability distribution over \(\mathbb{N}\). This probability distribution is a construction that reflects the health situation in a population at a given time (year). This probability distribution does not describe the sequence of sanitary situations experienced by a cohort (people born during a specific year).

One works with a period, or current, life table (table du moment). This summarizes the mortality experience of persons across all ages in a short period, typically one year or three years. More precisely, the death probabilities \(q(x)\) for every age \(x\) are computed for that short period, often using census information gathered at regular intervals. These \(q(x)\)’s are then applied to a hypothetical cohort of \(100 000\) people over their life span to produce a life table.

Code
life_table %>% 
  filter(Country=='France', Year== 2010, Gender=='Female', Age < 10 | Age > 80) |> 
  knitr::kable()
Year Age mx qx ax lx dx Lx Tx ex Country Gender Area
2010 0 0.00325 0.00324 0.14 100000 324 99722 8465207 84.65 France Female SE
2010 1 0.00032 0.00032 0.50 99676 32 99660 8365484 83.93 France Female SE
2010 2 0.00015 0.00015 0.50 99645 15 99637 8265824 82.95 France Female SE
2010 3 0.00011 0.00011 0.50 99630 11 99624 8166187 81.97 France Female SE
2010 4 0.00008 0.00008 0.50 99619 8 99615 8066563 80.97 France Female SE
2010 5 0.00005 0.00005 0.50 99611 5 99608 7966948 79.98 France Female SE
2010 6 0.00008 0.00008 0.50 99606 8 99602 7867339 78.98 France Female SE
2010 7 0.00008 0.00008 0.50 99598 8 99594 7767737 77.99 France Female SE
2010 8 0.00008 0.00008 0.50 99590 8 99586 7668143 77.00 France Female SE
2010 9 0.00007 0.00007 0.50 99582 7 99578 7568557 76.00 France Female SE
2010 81 0.03516 0.03455 0.50 73367 2535 72099 727802 9.92 France Female SE
2010 82 0.04059 0.03978 0.50 70832 2818 69423 655702 9.26 France Female SE
2010 83 0.04754 0.04644 0.50 68014 3158 66435 586280 8.62 France Female SE
2010 84 0.05536 0.05386 0.50 64856 3493 63109 519845 8.02 France Female SE
2010 85 0.06295 0.06103 0.50 61362 3745 59490 456736 7.44 France Female SE
2010 86 0.07246 0.06993 0.50 57617 4029 55603 397246 6.89 France Female SE
2010 87 0.08256 0.07929 0.50 53588 4249 51464 341643 6.38 France Female SE
2010 88 0.09660 0.09215 0.50 49339 4547 47066 290180 5.88 France Female SE
2010 89 0.11088 0.10505 0.50 44792 4706 42440 243114 5.43 France Female SE
2010 90 0.12839 0.12064 0.50 40087 4836 37669 200675 5.01 France Female SE
2010 91 0.14262 0.13313 0.50 35251 4693 32904 163006 4.62 France Female SE
2010 92 0.16006 0.14820 0.50 30558 4529 28293 130102 4.26 France Female SE
2010 93 0.18419 0.16866 0.50 26029 4390 23834 101809 3.91 France Female SE
2010 94 0.21165 0.19140 0.50 21639 4142 19568 77975 3.60 France Female SE
2010 95 0.23048 0.20667 0.50 17497 3616 15689 58407 3.34 France Female SE
2010 96 0.25800 0.22852 0.50 13881 3172 12295 42718 3.08 France Female SE
2010 97 0.28759 0.25143 0.50 10709 2693 9363 30423 2.84 France Female SE
2010 98 0.31910 0.27519 0.50 8016 2206 6913 21060 2.63 France Female SE
2010 99 0.35236 0.29958 0.50 5810 1741 4940 14146 2.43 France Female SE
2010 100 0.38712 0.32434 0.50 4070 1320 3410 9206 2.26 France Female SE
2010 101 0.42306 0.34920 0.50 2750 960 2270 5797 2.11 France Female SE
2010 102 0.45984 0.37388 0.50 1790 669 1455 3527 1.97 France Female SE
2010 103 0.49706 0.39812 0.50 1120 446 897 2072 1.85 France Female SE
2010 104 0.53432 0.42166 0.50 674 284 532 1175 1.74 France Female SE
2010 105 0.57119 0.44430 0.50 390 173 303 643 1.65 France Female SE
2010 106 0.60729 0.46584 0.50 217 101 166 339 1.56 France Female SE
2010 107 0.64226 0.48614 0.50 116 56 88 173 1.49 France Female SE
2010 108 0.67577 0.50510 0.50 59 30 44 85 1.43 France Female SE
2010 109 0.70757 0.52266 0.50 29 15 22 41 1.39 France Female SE

Check on http://www.mortality.org the meaning of the different columns.

In the sequel, we denote by \(F_{t}\) the cumulative distribution function for year \(t\). \(F_t(x)\) represents the probability of dying at age not larger than \(x\).

We agree on \(\overline{F}_t = 1 - F_t\) and \(F_t(-1)=0\).

The life tables are highly redundant. Provided we get the right conventions we can derive almost all columns from column qx.

Code
life_table %>% 
  filter( Year>=1948) %>% 
  group_by(Country, Year, Gender) %>% 
  summarise(m1 =max(abs(lx -dx -lead(lx)), na.rm = T), 
            m2 =max(abs(lx * qx -dx), na.rm=T),
            m3 =max(abs(Lx -lx * (1 + qx * (ax-1))), na.rm=T),
            m4 =max(abs(1-exp(-mx)-qx), na.rm=T)) %>% 
  select(Year, Country, Gender, m1, m2, m3, m4) %>%  
  ungroup() %>% 
  group_by(Country, Gender) %>% 
  slice_max(order_by = desc(m4), n = 1)
`summarise()` has grouped output by 'Country', 'Year'. You can override using
the `.groups` argument.
# A tibble: 21 × 7
# Groups:   Country, Gender [21]
    Year Country         Gender    m1    m2    m3      m4
   <int> <fct>           <fct>  <int> <dbl> <dbl>   <dbl>
 1  1948 Spain           Both       1 0.874 2.20  0.00838
 2  1948 Spain           Female     1 0.789 1.56  0.00816
 3  1952 Spain           Male       1 0.802 5.5   0.0119 
 4  2004 Italy           Both       1 0.836 0.968 0.0150 
 5  2004 Italy           Female     1 0.875 1.03  0.0149 
 6  1984 Italy           Male       1 0.774 5.56  0.0146 
 7  2007 France          Both       1 0.887 0.976 0.0152 
 8  2007 France          Female     1 0.890 0.980 0.0151 
 9  1979 France          Male       1 0.764 4.97  0.0161 
10  1992 England & Wales Both       1 0.898 2.42  0.0135 
# ℹ 11 more rows
qx
(age-specific) risk of death at age \(x\), or mortality quotient at given age \(x\) for given year \(t\): \(q_{t,x} = \frac{\overline{F}_t(x) - \overline{F}_t(x+1)}{\overline{F}_t(x)}\).
For each year, each age, \(q_{t,x}\) is determined by data from that year.

We also have \[\overline{F}_{t}(x+1) = \overline{F}_{t}(x) \times (1-q_{t,x+1})\, .\]

mx
central death rate at age \(x\) during year \(t\). This is connected with \(q_{t,x}\) by \[m_{t,x} = -\log(1- q_{t,x}) \,, \] or equivalently \(q_{t,x} = 1 - \exp(-m_{t,x})\).
lx
the so-called survival function: the scaled proportion of persons alive at age \(x\). These values are computed recursively from the \(q_{t,x}\) values using the formula \[l_t(x+1) = l_t(x) \times (1-q_{t,x}) \, ,\] with \(l_{t,0}\), the “radix” of the table, arbitrarily set to \(100000\). Function \(l_{t,\cdot}\) and \(\overline{F}_t\) are connected by \[l_{t,x + 1} = l_{t,0} \times \overline{F}_t(x)\,.\] Note that in Probability theory, \(\overline{F}\) is also called the survival or tail function.
dx
\(d_{t,x} = q_{t,x} \times l_{t,x}\)
Tx
Total number of person-years lived by the cohort from age \(x\) to \(x+1\). This is the sum of the years lived by the \(l_{t, x+1}\) persons who survive the interval, and the \(d_{t,x}\) persons who die during the interval. The former contribute exactly \(1\) year each, while the latter contribute, on average, approximately half a year, so that \(L_{t,x} = l_{t,x+1} + 0.5 \times d_{t,x}\). This approximation assumes that deaths occur, on average, half way in the age interval x to x+1. Such is satisfactory except at age 0 and the oldest age, where other approximations are often used; We will stick to a simplified vision \(L_{t,x}= l_{t,x+1}\)
ex:
Residual Life Expectancy at age \(x\) and year \(t\)

Western countries in 1948

Several pictures share a common canvas: we plot central death rates against ages using a logarithmic scale on the \(y\) axis. Countries are identified by aesthetics (shape, color, linetypes). Abiding to the DRY principle, we define a prototype ggplot (alternatively plotly) object. The prototype will be fed with different datasets and decorated and arranged for the different figures.

Code
dummy_data <- dplyr::filter(life_table, FALSE)

proto_plot <- ggplot(dummy_data,
                     aes(x=Age,
                         y=qx,
                         col=Area,
                         linetype=Country,
                         shape=Country)) +
              scale_y_log10() +
              scale_x_continuous(breaks = c(seq(0, 100, 10), 109)) +
              ylab("Mortality quotients") +
              labs(linetype="Country") +
              theme_bw()
Question

Plot qx of all Countries at all ages for years 1948 and 2013.