EDA VII Correspondance Analysis

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/UniversiteParisCite_logo_horizontal_couleur_RVB.jpeg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
name: inter-slide
class: left, middle, inverse

---

# Exploratory Data Analysis : EDA  Correspondence Analysis

### 2022-03-23

#### [Master I MIDS & MFA]()

#### [Analyse Exploratoire de Données](http://stephane-v-boucheron.fr/courses/eda/)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
exclude: true
template: inter-slide

# Exploratory Data Analysis VII CA

### 2022-03-23

#### [EDA Master I MIDS et MFA](/courses/eda/index.html)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
template: inter-slide

## <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 117.66v346.32c0 11.32 11.43 19.06 21.94 14.86L160 416V32L20.12 87.95A32.006 32.006 0 0 0 0 117.66zM192 416l192 64V96L192 32v384zM554.06 33.16L416 96v384l139.88-55.95A31.996 31.996 0 0 0 576 394.34V48.02c0-11.32-11.43-19.06-21.94-14.86z"/></svg>

### [Exploring two-way contingency tables](#motivca)

### [Chi-square divergence](#chi-deux-section)

### [Extending SVD](#revSVD)

### [Correspondance Analysis as extended SVD](#basicca)

### [Illustrations](#factoshowcase)

---
name: motivca
template: inter-slide

## Motivations

---

In this lesson, we further explore the connection between two qualitative variables, that is,
we attempt to go beyond Lesson [Bivariate statistics](#biv).

Our starting point is a qualitative
bivariate sample.

Consider the  celebrated `UCBAdmissions` dataset

According to `R` documentation,
this dataset is made of

> Aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex.

This is a compilation of 4526 application files.

For each application file,
three variables have been reported: the `department`, the `gender` of the applicant and whether
the applicant has been `admitted`

The dataset is a trivariate sample. In the sequel we make it
a bivariate sample by focusing on `Gender`  and `Admit`.

???

The dataset was gathered because
the admission process at UCB graduate schools was suspected to be biased against women

---

### Plotting two-way contingency tables

As in Lesson [Bivariate statistics](#biv), we start by collecting the 2-way _contingency table_ and
displaying a _mosaicplot_.

.fl.w-50.pa2[

```r
*mosaicplot(~ Gender + Admit,
*          UCBAdmissions)
```

]

.fl.w-50.pa2[

![](cm-7-EDA_files/figure-html/mosaic1-1.png)

]

.fl.w-50.pa2[

```r
apply(UCBAdmissions,
      MARGIN = c(1,2),
      sum) %>%
  as.table() %>%
* vcd::mosaic(formula=Admit~Gender,
*             data=.)
```
]

.fl.w-50.pa2[

![](cm-7-EDA_files/figure-html/mosaic2-1.png)

]

.fl.w-50.pa2[

.f6[
If, rather than the contingency table,  we have the original data (a dataframe
with two categorical columns), we can visualize the contengincy table using
`geom_mosaic` from `ggmosaic`. Conforming to the `ggplot2` API, `geom_mosaic`
works on a native dataframe and compute _statistics_ from the dataframe, the
statistics are then used to draw the picture

```r
titanic_train %>%
  dplyr::select(Pclass, Embarked) %>%
  drop_na() %>%
  ggplot() +
  geom_mosaic(aes(x = product(Embarked, Pclass), fill=Embarked)) +
  labs(x= "Passenger class", y="Embarked") +
  scale_fill_viridis_d() +
  ggtitle("Titanic mosaic with tidyverse flavor")
```
]
]

.fl.w-50.pa2[

![](cm-7-EDA_files/figure-html/mosaic3-1.png)

]

]
]

???

This mosaicplot suggests that Admission and Gender are associated.

[ggmosaic](https://cran.r-project.org/web/packages/ggmosaic/vignettes/ggmosaic.html)

---
template: inter-slide
name: chi-deux-section

## Chi-square divergence

---
name: chisqDist

In order to assess the association between two qualitative variables, we use
an _information divergence_ between the joint distribution and the product
of the two marginal distributions.

In correspondence analysis, the relevant information
divergence is the `$\chi^2$` (khi-square) divergence.

### Definition   Chi-square divergence

Let `$P$` and `$Q$` be two probability distributions over the same measurable space.

The `$\chi^2$` (khi-square) divergence from `$P$` to `$Q$` is defined as

- `$\infty$` if `$P$` is not absolutely continuous with respect to `$Q$` and

- `$\chi^2(P \Vert Q)  = \mathbb{E}_Q\Big[ \left(\frac{\mathrm{d}P}{\mathrm{d}Q} - 1\right)^2\Big]  \qquad \text{  otherwise}$`

---

### `$f$`-divergences

The `$\chi^2$`-divergence belongs to the family of _information divergences_ or  `$f$`-divergences
that fits the next expression: if `$P$` is absolutely continuous with respect to `$Q$`

`$$I_f(P, Q)  = \mathbb{E}_Q\Big[ f\left(\frac{\mathrm{d}P}{\mathrm{d}Q}\right)\Big]$$`

with `$f$` convex and `$f(1)=0$`.

- for `$f(x)= x \log (x)$`, the information divergence is the _relative entropy_ or _Kullback-Leibler divergence_
 `$D(P \Vert Q)$`

- for `$f(x)= |x -1|$`, the information divergence is the _total variation distance_ `$\mathrm{d}_{\text{TV}}(P,Q)$`

- for `$f(x)= (x-1)^2$`, the `$f$`-divergence is the `$\chi^2$` _divergence_   `$\chi^2(P \Vert Q)$`

---
name: chisqassociation

### Definition Chi-Square association index

Let  `$P$` be a probability distribution over `$\mathcal{X} \times \mathcal{Y}$` (a bivariate distribution)
with marginal distributions `$P_X$` and `$P_Y$`.

The `$\chi^2$`-association index is defined as

`$$\chi^2(P \Vert P_X \otimes P_Y)$$`

---

### Application

A 2-way contingency table like `ucb2` defines a joint distribution `$P$` on couples
from `$\mathcal{X} \times \mathcal{Y}$` as well as two marginal distributions:

`$$P\{(a,b)\} = \frac{N_{a,b}}{n} \qquad P_X\{a\} = \frac{N_{a, .}}{n} \qquad
P_Y\{b\} = \frac{N_{.,b}}{n}$$`

In the sequel, `$w_a = P_X\{a\} = \frac{N_{a, .}}{n}$` for every `$a\in \mathcal{X}$`.

The `$\chi^2$` divergence defined by the two-way contingency table is
`$$\begin{array}{rl}  \chi^2(P, \Vert P_X \otimes P_Y)  & = \sum_{a \in \mathcal{X}, b \in \mathcal{Y}}  \frac{N_{a,\cdot} N_{\cdot, b}}{n^2} \left( \frac{n N_{a,b}}{N_{a,\cdot} N_{\cdot, b}} - 1\right)^2 \\  & = \sum_{a \in \mathcal{X}} \frac{N_{a, .}}{n} \sum_{b \in \mathcal{Y}}   \frac{\left( \frac{N_{a,b}}{N_{a,\cdot}} - \frac{N_{\cdot, b}}{n} \right)^2}{N_{.,b}/n} \\   & = \sum_{a \in \mathcal{Y}} w_a \times \chi^2\left(P_Y (\cdot | X=a)  \Vert   P_Y\right) \end{array}$$`

???

The last expression shows that the _Pearson association_ statistic is a (convex) combination of
the `$\chi^2$` divergences between the conditional distributions of `$Y$`  given the values of `$X$` and the
marginal distribution of `$Y$`.

---

### Pearson association index

Given a two-way contingency table, the Pearson association index is defined as

`$$n \times \chi^2(P \Vert P_X \otimes P_Y) =  \sum_{a \in \mathcal{X},b \in \mathcal{Y}}  \frac{\left({N_{a,b}} - \frac{N_{a, .}}{n}N_{., b} \right)^2}{N_{a,.} N_{.,b}/n}$$`

---

Note the factor `$n$`.

It is motivated by testing theory.

Indeed, if we collect a bivariate sample of length `$m$`
according to `$P_X \otimes P_Y$` (that is if the two variables are stochastically independent), compute the contingency
table defined by this random sample, then the Pearson association is a random variable.

If `$m$` is large enough,
the distribution of the Pearson association index is close (in the topology of weak convergence) to a
`$\chi^2$`-distribution with `$(|\mathcal{X}|-1)(|\mathcal{Y}|-1)$` degrees of freedom.

The fact that the distribution of the Pearson association index is close to a probability distribution that does not depend on the details of `$P_X$` and `$P_Y$` is important, both theoretically and practically.

We may compare the computed Pearson association index to the quantiles of the `$\chi^2$` distribution
with the adequate number of degrees of freedom.

---

### Illustration UCB admission data

For the UCB admission data, this leads to

```r
M <- apply(UCBAdmissions, MARGIN = c(1,2), sum)

rowsums <- M %*% matrix(1, nrow=2, ncol=1)

colsums <- matrix(1, nrow=1, ncol=2)  %*% M

n <- sum(M)

*sum((M - rowsums %*% colsums/n)^2/(rowsums %*% colsums/n))
```

```
[1] 92.20528
```

```r
chisq.test(as.table(apply(UCBAdmissions, MARGIN = c(1,2), sum)), correct = FALSE)
```

```

Pearson's Chi-squared test

data:  as.table(apply(UCBAdmissions, MARGIN = c(1, 2), sum))
X-squared = 92.205, df = 1, p-value < 2.2e-16
```

---

### Tentative interpretation

This results suggests that admission and gender are associated. Further
interpretation requires more visualization tools and modeling assumptions.

We shall first extend the matrix analytic notions that underpins
Principal Component Analysis, that is SVD and Hilbert-Schmidt-Frobenius norms.

In [Recipes](#basicca),  we construct _Correspondence Analysis_ as a pair of dual _extended SVD_ known
as row-profiles and column-profiles analyses

In [Applications](#visuca), we illustrate recipes
on examples using  <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> packages

---
name: revSVD
template: inter-slide

## Revisiting and extending SVD

---

Just like PCA, correspondance analysis (CA) builds on Singular Value Decomposition <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M639.4 433.6c-8.4-20.4-31.8-30.1-52.2-21.6l-22.1 9.2-38.7-101.9c47.9-35 64.8-100.3 34.5-152.8L474.3 16c-8-13.9-25.1-19.7-40-13.6L320 49.8 205.7 2.4c-14.9-6.2-32-.3-40 13.6L79.1 166.5C48.9 219 65.7 284.3 113.6 319.2L74.9 421.1l-22.1-9.2c-20.4-8.5-43.7 1.2-52.2 21.6-1.7 4.1.2 8.8 4.3 10.5l162.3 67.4c4.1 1.7 8.7-.2 10.4-4.3 8.4-20.4-1.2-43.8-21.6-52.3l-22.1-9.2L173.3 342c4.4.5 8.8 1.3 13.1 1.3 51.7 0 99.4-33.1 113.4-85.3l20.2-75.4 20.2 75.4c14 52.2 61.7 85.3 113.4 85.3 4.3 0 8.7-.8 13.1-1.3L506 445.6l-22.1 9.2c-20.4 8.5-30.1 31.9-21.6 52.3 1.7 4.1 6.4 6 10.4 4.3L635.1 444c4-1.7 6-6.3 4.3-10.4zM275.9 162.1l-112.1-46.5 36.5-63.4 94.5 39.2-18.9 70.7zm88.2 0l-18.9-70.7 94.5-39.2 36.5 63.4-112.1 46.5z"/></svg>

This transformation is best explained by extending the SVD definition

---

### Digression

For `$\mathbb{R}^p$`, the Euclidean norm of `$x$` is defined by `$\| x \|^2 = \sum_{i=1}^p x_i^2 = \langle x, x\rangle$`

If `$A \in \mathcal{M}_{p,p}$` is a symmetric positive definite matrix with spectral decomposition `$A = O \times \Lambda  \times O^T$` (the columns of `$O$` are unit eigenvectors of `$A$`,  `$O \times O^T=\text{Id}_p$` and the diagonal coefficients of `$\Lambda$` are the eigenvalues of `$A$`)

We may define another norm on `$\mathbb{R}^p$`: 
`$$\| x \|_A^2 =  \langle x, A x\rangle =  (O^Tx)^T \Lambda (O^T x)$$`

`$$\| x \|_A^2 = \langle A^{1/2}x, A^{1/2} x\rangle$$`

This is the Mahalanobis norm induced by `$A$`

???

---

When dealing with matrices, the Hilbert-Schmidt norm plays the role of the Euclidean norm

We may define variations of Hilbert-Schmidt norm in a way that parallels the definition of Mahalanobis norms

---

`$\mathcal{M}(n,p)$` denotes the set of real matrices with `$n$` rows and `$p$` columns

We will use weighted versions of [Hilbert-Schmidt-Frobenius inner-product and norm](#eckartyoung).

---
name: weighted-HS-norm

### Proposition Weighted Hilbert-Schmidt-Frobenius inner-product

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

Let `$\Omega \in \mathcal{M}_{p, p}, \Phi \in \mathcal{M}_{q,q}$` be symmetric and positive definite.

Matrices `$\Omega, \Phi$` define an _inner product_ on `$\mathcal{M}_{p,q}$`:
let `$X, Y \in \mathcal{M}_{p,q}$`

`$$\begin{array}{rl} \langle X, Y\rangle_{\Omega, \Phi}  & = \operatorname{Trace}\left(\Omega \times X \times \Phi \times Y^T\right)  \\  & =  \bigg\langle  \Omega^{1/2} \times X \times \Phi^{1/2}, \Omega^{1/2} \times Y \times \Phi^{1/2}  \bigg\rangle_{\text{HS}}\end{array}$$`

The weighted Hilbert-Schmidt-Frobenius norm of `$X$` is defined by

`$$\Vert X \Vert_{\Omega, \Phi}^2 = \left\Vert   \Omega^{1/2} \times X \times \Phi^{1/2}\right\Vert_{\text{HS}}^2$$`

]

???

---

### Proof

Linearity with respect to `$X$` and `$Y$` follows from distributivity of matrix multiplication with respect to matrix addition.

Symmetry is a consequence of invariance of trace by similarity.

Multiplying on the left by `$\Omega^{-1/2}$`  and on the right by `$\Omega^{1/2}$`,

`$$\begin{array}{rl}
\langle X, Y\rangle_{\Omega, \Phi}
 & = \operatorname{Trace}\left(\Omega \times X \times \Phi \times Y^T\right)  \\
 & = \operatorname{Trace}\left(\Omega^{1/2} \times X \times \Phi \times Y^T \times \Omega^{1/2} \right) \\
 & = \operatorname{Trace}\left(\big(\Omega^{1/2} \times X \times \Phi^{1/2}\big) \times\big(\Phi^{1/2} \times Y^T \times \Omega^{1/2}\big) \right) \\
& = \operatorname{Trace}\left(\big(\Omega^{1/2} \times X \times \Phi^{1/2}\big) \times \big(\Omega^{1/2} \times Y \times \Phi^{1/2}  \big)^T \right)\\
& = \left\langle \Omega^{1/2} \times X \times \Phi^{1/2}, \Omega^{1/2} \times Y \times \Phi^{1/2} \right\rangle_{\text{HS}} \\
& = \left\langle \Omega^{1/2} \times Y \times \Phi^{1/2}, \Omega^{1/2} \times X \times \Phi^{1/2} \right\rangle_{\text{HS}} \\
& = \langle Y, X\rangle_{\Omega, \Phi}\end{array}$$`

???

Remember that the trace is _invariant by similarity_:

`$$\operatorname{Tr}(Z) = \operatorname{Tr}(P \times Z \times P^{-1})$$`
for all  `$Z$` and invertible `$P$`

---

### Proof (continued)

Positivity is a by-product of the argument above:

`$$\begin{array}{rl}\langle X, X\rangle_{\Omega, \Phi}  & = \langle \Omega^{1/2} \times X \times \Phi^{1/2}, \Omega^{1/2} \times X \times \Phi^{1/2} \rangle \\  & \geq 0\end{array}$$`

with equality only if `$X=0$`

???

We shall extend the notion of Singular Value Decomposition so as to adapt
to weighted Hilbert-Schmidt-Frobenius norms.

---
name: extendedsvd

### Theorem Extended SVD factorization

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

Let `$\Omega \in \mathcal{M}(p,p)$`, `$\Phi \in \mathcal{M}(q,q)$` be symmetric and Positive Definite.

Let `$X \in \mathcal{M}(n,p)$`

There exists `$M, D, N$` such that

`$$X =  N \times D \times M^T$$`

with  `$N \in \mathcal{M}(p,p)$`, `$M \in \mathcal{M}(q,q)$` and  `$D \in \mathcal{M}(p, q)$` satisfying:

- `$N^T \times \Omega \times N = I_p$`

- `$M^T \times \Phi \times M = I_q$`

- `$D_{k,k} \geq D_{k+1, k+1} \geq 0$`  for `$0\leq k< \min (p,q)$`  and `$D_{j,k}=0$` for `$j\neq k$`.

If `$\Omega = I_p$` and `$\Phi = I_q$`, we recover the Singular Value Decomposition

]

---

### Proof

Because they are positive definite, by the spectral decomposition theorem, `$\Omega$` and `$\Phi$`
have invertible square roots:

there exists symmetric invertible matrices `$\Omega^{1/2}$` and `$\Phi^{1/2}$` such that

`$$\Omega = (\Omega^{1/2})^2\qquad\text{and}\qquad\Phi = (\Phi^{1/2})^2$$`

Let `$U \times D \times V^T$`  be the SVD factorization of `$\Omega^{1/2} \times  X \times \Phi^{1/2}$`, then

`$$X = \underbrace{\Omega^{-1/2} \times U}_{:=N} \times D \times   \underbrace{V^T \times \Phi^{-1/2}}_{:=M^T}$$`

???

The existence  `$\Omega^{1/2}$`  and `$\Phi^{1/2}$` is guaranteed by the _spectral decomposition theorem_

The proof provides a straightforward computational recipe

---

### Proof (continued)

Indeed:

`$$\begin{array}{rl}N^T \times \Omega \times N & = (U^T \times \Omega^{-1/2}) \times \Omega \times (\Omega^{-1/2} \times U) \\ &= U^T \times  (\Omega^{-1/2} \times \Omega \times \Omega^{-1/2} ) \times U \\ & = \operatorname{Id}_p\end{array}$$`

and similarly for `$M$` and `$\Phi$`.

---

### <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"/></svg> Computing the extended SVD

Given `$X$`, `$\Omega$` and `$\Phi$`, computing the extended SVD of `$X$` with respect to `$\Phi, \Omega$` is not harder than computing an ordinary SVD

`$\Omega^{1/2}$` and `$\Phi^{1/2}$` can be obtained from the SVD of `$\Omega, \Phi$`

Indeed, from the proof of the existence of the extended SVD, the extended SVD of `$X$` can be obtained in a three-steps approach:

- Compute `$\Omega^{1/2}$` and `$\Phi^{1/2}$`  from the SVDs of `$\Omega, \Phi$`

- Compute the ordinary SVD `$U \times D \times V^T$` of `$\Omega^{1/2} \times X \times \Phi^{1/2}$`

- Return `$\Omega^{-1/2} \times U$`, `$D$`,  `$\Phi^{-1/2}\times V$`

???

---
name: truncSVD

### Definition: Truncated extended SVD factorization

Let `$k \leq \min(p,q)$`.

The `$k$`-truncated extended SVD factorization of `$X \in \mathcal{M}(p,q)$`
with respect to `$\Omega, \Phi$`
is derived from the extended SVD factorization
`$$X = N \times D \times M^T$$`
by taking the first `$k$` columns of `$N$` ( `$N_{[k]}$` ) and `$M$` ( `$M_{[k]}$` ) as well as the first `$k$` rows and columns of `$D$`  ( `$D_{[k]}$` )

The matrix

`$$N_{[k]} \times D_{[k]} \times M_{[k]}^T$$`

has rank not larger than `$k$`.

It has rank `$k$` if `$D[k,k]>0$`.

???

The truncated generalized SVD factorization is defined in a way that parallels the truncated SVD

---

### Performance of `$k$`-truncated SVD

The `$k$`-truncated SVD of `$X$` is known to be the best rank- `$k$` approximation of `$X$` in the least-square sense
(that is according to the Hilbert-Schmidt norm)  <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M639.4 433.6c-8.4-20.4-31.8-30.1-52.2-21.6l-22.1 9.2-38.7-101.9c47.9-35 64.8-100.3 34.5-152.8L474.3 16c-8-13.9-25.1-19.7-40-13.6L320 49.8 205.7 2.4c-14.9-6.2-32-.3-40 13.6L79.1 166.5C48.9 219 65.7 284.3 113.6 319.2L74.9 421.1l-22.1-9.2c-20.4-8.5-43.7 1.2-52.2 21.6-1.7 4.1.2 8.8 4.3 10.5l162.3 67.4c4.1 1.7 8.7-.2 10.4-4.3 8.4-20.4-1.2-43.8-21.6-52.3l-22.1-9.2L173.3 342c4.4.5 8.8 1.3 13.1 1.3 51.7 0 99.4-33.1 113.4-85.3l20.2-75.4 20.2 75.4c14 52.2 61.7 85.3 113.4 85.3 4.3 0 8.7-.8 13.1-1.3L506 445.6l-22.1 9.2c-20.4 8.5-30.1 31.9-21.6 52.3 1.7 4.1 6.4 6 10.4 4.3L635.1 444c4-1.7 6-6.3 4.3-10.4zM275.9 162.1l-112.1-46.5 36.5-63.4 94.5 39.2-18.9 70.7zm88.2 0l-18.9-70.7 94.5-39.2 36.5 63.4-112.1 46.5z"/></svg>

The  `$k$`-truncated extended SVD factorization
also enjoys optimal approximation properties under rank constraints

Wheras the optimality
property of the truncated SVD was assessed with respect to Hilbert-Schmidt-Frobenius norm,
the property of the truncated SVD is assessed with respect to a weighted version of Hilbert-Schmidt-Frobenius norm

---
name: weightedHSorthobasis

### Proposition

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

Let `$\Omega \in \mathcal{M}_{p, p}, \Phi \in \mathcal{M}_{q,q}$` be symmetric and positive definite.

Let `$X \in \mathcal{M}_{p,q}$` with extended SVD factorization `$X = N \times D \times M^T$` with respect to `$\Omega$`
and `$\Phi$`.

Let `$(n_i)_{i \leq p}$` and `$(m_{j})_{j\leq q}$`  denote the columns of `$N$` and `$M$`

then

`$\left(n_i \times m_j^T \right)_{i\leq p, j \leq q}$` is an orthonormal basis of `$\mathcal{M}_{q,q}$`
endowed with inner product `$\langle X, Y\rangle_{\Omega, \Phi}$`

]

???

The straightforward proof parallels the proof of the statement for the ordinary SVD.

---

### Proof

`$$\begin{array}{rl}\langle n_i \times m_j^T , n_k \times m_\ell^T \rangle_{\Omega, \Phi} & = \operatorname{Tr}\left(\Omega \times n_i \times \underbrace{m_j^T \times \Phi \times m_\ell}_{=\langle m_j , m_{\ell}\rangle_{\Phi}} \times n_k^T \right)\\ &= \langle m_j , m_{\ell}\rangle_{\Phi} \operatorname{Tr}\left(\Omega \times n_i  \times n_k^T \right)\\ & = \langle m_j , m_{\ell}\rangle_{\Phi} \times \langle n_i , n_{k}\rangle_{\Omega}\end{array}$$`

The fact that `$N$` (resp. `$M$`) is `$\Omega$` (resp. `$\Phi$`) orthogonal allows to conclude

???

We just take advantage of the observation

`$$m_j^T \times \Phi \times m_\ell = \operatorname{Tr}\left(\Phi \times m_\ell \times m_j^T \right)$$`

---
name: extendedEckartYoung

### Theorem  Eckart-Young Theorem for extended SVD

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

Let `$\Omega \in \mathcal{M}_{p, p}, \Phi \in \mathcal{M}_{q,q}$` be symmetric and positive definite.

Let `$X \in \mathcal{M}_{p,q}$` with extended SVD factorization `$X = N \times D \times M^T$`. Let `$X_{[k]}$`
denote the `$k$`-truncated extended SVD of `$X$` with respect to `$\Omega$` and `$\Phi$`.

Then `$X_{[k]}$` minimizes

`$$\Vert X-Y\Vert^2_{\Omega, \Phi} =  \langle X-Y, X-Y\rangle_{\Omega, \Phi} = \operatorname{Trace}\left(\Omega (X-Y) \Phi (X-Y)^T\right)$$`

among all matrices `$Y \in \mathcal{M}_{p,q}$` with rank less or equal than `$k$`.

]

---

### Proof

The argument parallels the proof of Eckart-Young's Theorem for SVD.

---
name: basicca
template: inter-slide

##  Correspondence analysis : recipes

---

### FactoMineR homepage

---

### FactoMineR on Correspondance Analysis

---

- normalizing and recentering the 2-way contingency table

- computing the _extended_ SVD
with carefully chosen matrices `$\Omega$` and `$\Phi$`.

- Row Profiles analysis

- Column Profiles analysis

- Singular value analysis

---

### Two important diagonal matrices

Consider a 2-way contingency table `$X \in \mathcal{M}_{p, q}$`.

Let `$n$` denote sum of coefficients of `$X$`: `$n = \sum_{a \leq p, b \leq q} X_{a, b}$`.

Let `$P$` be the _normalization_ of `$X$`:

`$$P = \frac{1}{n} X$$`

Let

- `$D_r \in \mathcal{M}_{p,p}$` be the diagonal matrix of row sums of `$P$`

- `$D_c \in \mathcal{M}_{q,q}$` be the diagonal matrix of column sums of `$P$`

The diagonal elements of `$D_r$` define the row weights

The diagonal coefficients of `$D_c$` define the so-called _centroid_ : the marginal distribution of `$Y$`

???

The marginal distribution of `$Y$` is the weighted average of
the conditional distributions of `$Y$` given `$X$`.

---
name: rowprofiles

### Row profiles

The rows of matrix

`$$R = D_r^{-1} \times P \qquad  R_{a,b} = \frac{N_{a,b}}{N_{a,.}}$$`

are called _row profiles_

Each _row profile_ defines a conditional probability of `$Y$` given `$X$`

In Correspondence Analysis, row profile analysis consists of factorizing the centered row profile matrix `$R$`

The recipe consists in choosing the couple of symmetric positive definitive matrices that define the extended SVD

---

### Example

Consider the 2-way contingency table derived from the `Embarkment` and `Pclass` columns of the `train` subset of the Titanic dataset

|   |   1|   2|   3|
|:--|---:|---:|---:|
|Queenstown  |   2|   3|  72|
|Cherbourg  |  85|  17|  66|
|Southampton  | 127| 164| 353|

Once normalized by the table sum `n=889`, we obtain matrix `$P$` (up to rounding errors):

`$$\begin{bmatrix}
 0.002 & 0.003 & 0.081 \\
 0.096 & 0.019 & 0.074 \\
 0.143 & 0.184 & 0.397
\end{bmatrix}$$`

Matices `$D_r$` and `$D_c$` are

`$$D_r \approx
\begin{bmatrix} 0.086 & 0 & 0 \\
                0 & 0.189 & 0 \\
                0 & 0 & 0.724 \end{bmatrix} \qquad
 D_c \approx
\begin{bmatrix} 0.241 & 0 & 0 \\
                0 & 0.206 & 0 \\
                0 & 0 & 0.552 \end{bmatrix}$$`

???

```r
readr::read_csv("DATA/TITANIC/train.csv",
                col_types = cols(Embarked = col_factor(levels = c("Q", "C", "S")),
                                 Pclass = col_factor(levels = c("1", "2", "3")),
                                 Sex = col_factor(levels = c("male", "female")),
                                 Survived = col_factor(levels = c("0", "1")))) %>%
dplyr::select(Embarked, Pclass)  %>% table() -> X
```

```r
P <- X/sum(X) ; Dr <- diag(rowSums(P)) ; Dc <- diag(colSums(P))
R <- solve(Dr) %*% P
foo <- matrix(1, nrow = 3, ncol=3) %*%  P
```

---

The row profiles matrix `$R$` is `$D_r^{-1} \times P$`

`$$R \approx \begin{bmatrix}
  0.02 & 0.03 & 0.94 \\
  0.51 & 0.10 & 0.39 \\
  0.20 & 0.25 & 0.55 \end{bmatrix}$$`

The matrix which is fed to _extended SVD_  is obtained by centering
`$R$` around the weighted average of the rows

`$$R - 1 \times 1^T \times P \approx \begin{bmatrix}
  -0.221 & -0.176 & 0.388 \\
   0.269 & -0.106 & -0.162 \\
  -0.041 & 0.044  & -0.002 \end{bmatrix}$$`

---

Recall that the row vector `$1^T \times P$` defines the marginal distribution of `$Y$`
(the column-profile of matrix `$P$`).

Note that all rows of `$R - 1 \times 1^T \times P$`
sum up to `$0$`, they are orthogonal to `$(1, 1, 1)^T$`.

|   |   1|   2|   3|
|:--|---:|---:|---:|
|Q  |   2|   3|  72|
|C  |  85|  17|  66|
|S  | 127| 164| 353|

---

### A Mahalanobis metric for row profiles

The space of row profiles (a subset of `$\mathbb{R}^q$`) is endowed
with the  weighted `$\ell_2$` metric (a Mahalanobis metric)
induced by `$D_c^{-1}$`:

`$$(x -y)^T \times D_c^{-1} \times (x-y)$$`

---

### Applying the extendend SVD

The extended SVD factorization is actually applied to

`$$D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T$$`

where `$1 \times 1^T$` denotes the matrix from `$\mathcal{M}_{p, q}$` with all entries equal to `$1$`.

Matrices `$\Omega$` and `$\Phi$` are chosen as `$D_r$` and `$D_c$`.

The entry of `$D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T$` at indices `$a,b$` are

`$$\frac{n N_{a,b}}{N_{a, .}N_{.,b}} - 1$$`

???

Entries of `$D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T$` show up in the
definition of the chi-square divergence between the joint empirical distributions
and the product of marginal empirical distributions

---

Recall the definition of the Pearson association index:

`$$\begin{array}{rl}
  n \chi^2(P \Vert  P_X \otimes P_Y)
  & = n \sum_{a \in \mathcal{X}, b \in \mathcal{Y}}  \frac{N_{a,\cdot} N_{\cdot, b}}{n^2} \left( \frac{n N_{a,b}}{N_{a,\cdot} N_{\cdot, b}} - 1\right)^2 \\
  & = n \sum_{a \in \mathcal{X}} \frac{N_{a, .}}{n} \sum_{b \in \mathcal{Y}}   \frac{\left( \frac{N_{a,b}}{N_{a,\cdot}} - \frac{N_{\cdot, b}}{n} \right)^2}{N_{.,b}/n} \\
  & = n \sum_{a \in \mathcal{Y}} w_a \times \chi^2\left(P_Y (\cdot | X=a),  P_Y\right) \\
  & = \sum_{a \in \mathcal{X},b \in \mathcal{Y}}  \frac{\left({N_{a,b}} - \frac{N_{a, .}}{n}N_{., b} \right)^2}{N_{a,.} N_{.,b}/n}
\end{array}$$`

where `$P$` is the joint distribution defined by the contingency table, `$P_X, P_Y$` the two marginal distributions

`$$\begin{array}{rl} n \chi^2(P \Vert  P_X \otimes P_Y) & = \big\Vert D_r^{1/2} \big(D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T \big)D_c^{1/2}\big\Vert^2_{\text{HS}} \\ & =
\big\Vert  \big(D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T \big)\big\Vert^2_{D_r^{1/2},D_c^{1/2}}\end{array}$$`

---

`$$D_r^{1/2} \times
\big(D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T \big) \times D_c^{1/2}
  =  D_r^{1/2}  \times \big(R - 1 \times  1^T \times P \big) \times  D_c^{-1/2}$$`

we can also portray correspondence analysis as

> computing the extended SVD of the centered row profiles matrix
`$$\big(R - 1\times  1^T \times P \big)$$`
> with respect to the symmetric positive definite matrices `$\Omega = D_r, \Phi = D_c^{-1}$`

---

### Interpretation of singular values

The sum of squared singular values equals the `$\chi^2$` Pearson index of association

It is also called the _total inertia_ of the centered row profiles matrix.

The number `$k$` of positive singular values equals the rank of  the centered row profiles matrix

The latter is not larger than `$q-1$`.

The sum of the squared `$k-2$` smallest positive singular values
equals the weighted squared Hilbert-Schmidt-Frobenius distance of the centered row profiles matrix
to the set of  rank-2  matrices (extended Eckart-Young Theorem).

???

---

### Interpretation of singular values (continued)

Let `$N \in \mathcal{M}_{p, k}$` (resp. `$M \in \mathcal{M}_{q, k}$`) denote the extended left (resp. right) singular vectors computed
from the extended SVD of  `$R - 1 \times 1^T \times P$` with respect
to `$D_r$` and `$D^{-1}_c$`

We have

`$$N^T \times D_r \times N = I_k \qquad\text{and}\qquad M^T \times D^{-1}_c \times M = I_k$$`

Let `$D_{s} \in \mathcal{M}_{k,k}$` be the diagonal matrix with non-increasing
singular values as diagonal coefficients

???

---

### Interpretation of left singular vectors

The truncated versions of matrix `$N \times D_s$` define the optimal subspaces onto which
to project the row profiles.

Turn back to the projected Titanic dataset.

The 2 positive squared singular values are `$0.103,  0.036$`
(if we sum these two values and multiply the result by the number `$889$`
of rows of the Titanic sub-dataset, we recover the Pearson association statistic, `$123.57$`).

`$$F = N \times D_s =\begin{bmatrix} -0.573 & -0.515  \\ 0.604 & -0.165   \\ -0.089  & 0.105\end{bmatrix}$$`

???

The graphical display of the row profiles consists in displaying the
points defined by the rows of `$F$`. The percentage of inertia assiated
with each axis is usually added to the plot. If possible
`rownames`  are used to identify each point and facilitate interpretation.

---

### <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M496.616 372.639l70.012-70.012c16.899-16.9 9.942-45.771-12.836-53.092L512 236.102V96c0-17.673-14.327-32-32-32h-64V24c0-13.255-10.745-24-24-24H248c-13.255 0-24 10.745-24 24v40h-64c-17.673 0-32 14.327-32 32v140.102l-41.792 13.433c-22.753 7.313-29.754 36.173-12.836 53.092l70.012 70.012C125.828 416.287 85.587 448 24 448c-13.255 0-24 10.745-24 24v16c0 13.255 10.745 24 24 24 61.023 0 107.499-20.61 143.258-59.396C181.677 487.432 216.021 512 256 512h128c39.979 0 74.323-24.568 88.742-59.396C508.495 491.384 554.968 512 616 512c13.255 0 24-10.745 24-24v-16c0-13.255-10.745-24-24-24-60.817 0-101.542-31.001-119.384-75.361zM192 128h256v87.531l-118.208-37.995a31.995 31.995 0 0 0-19.584 0L192 215.531V128z"/></svg> Titanic illustrated : row profiles

```r
P <- (1/sum(X)) * X
Dr <- diag(rowSums(P)) ; Dc <- diag(colSums(P))
R <- diag(rowSums(P)^(-1)) %*% P - matrix(1, 3, 1) %*% colSums(P)

svd_R_ext <- svd(diag(rowSums(P)^(1/2)) %*% R %*% diag(colSums(P)^(-1/2)))

N <- diag(rowSums(P)^(-1/2)) %*% svd_R_ext$u
M <- diag(colSums(P)^(1/2)) %*% svd_R_ext$v
F <- N %*% diag(svd_R_ext$d)

res_ca <- data.frame(F[, 1:2])
names(res_ca) <- c("lambda1", "lambda2")
res_ca$rownames_ <- c("Queensland", "Cherbourg", "Southampton")
```
]

```r
p <-  res_ca %>%
  ggplot() +
  aes(x=lambda1, y=lambda2, label=rownames_) +
  geom_point() +
  geom_text(vjust = 0, nudge_y = 0.05) +
  geom_point() +
  xlim(-1, 1) +
  ylim(-1, 1) +
  coord_fixed() +
  geom_hline(yintercept = 0, linetype=2) +
  geom_vline(xintercept = 0, linetype=2) +
  xlab(latex2exp::TeX("$\\lambda_1 = 74.1\\%$")) +
  ylab(latex2exp::TeX("$\\lambda_2 = 25.9\\%$")) +
  ggtitle("Titanic CA row profiles")

p
```
]

![](cm-7-EDA_files/figure-html/CArowtitanic-1.png)

]
]

#### Row profile correspondence analysis of `Embarked` and `Pclass` for subset of Titanic dataset.

Not too surprisingly, the class distribution conditioned on `Embarked=Southampton` is closest to the centroid, that is to the marginal distribution of Passenger class

]
]
---
name: colprofiles

### Column-profiles

Whereas in PCA, rows and columns
play different roles (observations and variables), in CA, rows and  columns are both associated  with modalities
of categorical variables.

Column analysis consists of performing row analysis on the transposed matrix `$P^T$`.

The matrix of centered column profiles is

`$$D_c^{-1} \times P^T - \mathbf{1} \times \mathbf{1}^T \times P^T$$`

Recall that for row profile analyis, the extended SVD factorization is  applied to

`$D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T$`,

with matrices `$\Omega$` and `$\Phi$` chosen as `$D_r$` and `$D_c$`. This leads
to

`$$D_r^{-1} \times P \times D_c^{-1} - 1 \times 1^T = \widetilde{N} \times D_s \times \widetilde{M}^T$$`

with

`$$R - 1 \times 1^T \times P = \widetilde{N} \times D_s \times \widetilde{M}^T \times D_c$$`

---

### Column-profiles (continued)

For column profile analysis, the extended SVD factorization is  applied to `$D_c^{-1} \times P^T \times D_r^{-1} - 1 \times 1^T$`, with matrices `$\Omega$` and `$\Phi$` chosen as `$D_c$` and `$D_r$`.

`$$D_c^{-1} \times P^T \times D_r^{-1} - 1 \times 1^T = \widetilde{M} \times D_s \times \widetilde{N}^T$$`

This leads to

`$$C - 1 \times 1^T \times P^T = \widetilde{M} \times D_s \times \widetilde{N}^T D_r$$`

Exchanging rows and columns
does not change the decomposition of inertia <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M639.4 433.6c-8.4-20.4-31.8-30.1-52.2-21.6l-22.1 9.2-38.7-101.9c47.9-35 64.8-100.3 34.5-152.8L474.3 16c-8-13.9-25.1-19.7-40-13.6L320 49.8 205.7 2.4c-14.9-6.2-32-.3-40 13.6L79.1 166.5C48.9 219 65.7 284.3 113.6 319.2L74.9 421.1l-22.1-9.2c-20.4-8.5-43.7 1.2-52.2 21.6-1.7 4.1.2 8.8 4.3 10.5l162.3 67.4c4.1 1.7 8.7-.2 10.4-4.3 8.4-20.4-1.2-43.8-21.6-52.3l-22.1-9.2L173.3 342c4.4.5 8.8 1.3 13.1 1.3 51.7 0 99.4-33.1 113.4-85.3l20.2-75.4 20.2 75.4c14 52.2 61.7 85.3 113.4 85.3 4.3 0 8.7-.8 13.1-1.3L506 445.6l-22.1 9.2c-20.4 8.5-30.1 31.9-21.6 52.3 1.7 4.1 6.4 6 10.4 4.3L635.1 444c4-1.7 6-6.3 4.3-10.4zM275.9 162.1l-112.1-46.5 36.5-63.4 94.5 39.2-18.9 70.7zm88.2 0l-18.9-70.7 94.5-39.2 36.5 63.4-112.1 46.5z"/></svg>

Investigating the left singular vectors `$\widetilde{M}$` of column profile analyis amounts to investigate the right singular vectors of row profile analysis `$D_c \times \widetilde{M}$`.

???

Column profile correspondence analysis of `Embarked` and `Pclass` for subset of Titanic dataset.

---

### <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M496.616 372.639l70.012-70.012c16.899-16.9 9.942-45.771-12.836-53.092L512 236.102V96c0-17.673-14.327-32-32-32h-64V24c0-13.255-10.745-24-24-24H248c-13.255 0-24 10.745-24 24v40h-64c-17.673 0-32 14.327-32 32v140.102l-41.792 13.433c-22.753 7.313-29.754 36.173-12.836 53.092l70.012 70.012C125.828 416.287 85.587 448 24 448c-13.255 0-24 10.745-24 24v16c0 13.255 10.745 24 24 24 61.023 0 107.499-20.61 143.258-59.396C181.677 487.432 216.021 512 256 512h128c39.979 0 74.323-24.568 88.742-59.396C508.495 491.384 554.968 512 616 512c13.255 0 24-10.745 24-24v-16c0-13.255-10.745-24-24-24-60.817 0-101.542-31.001-119.384-75.361zM192 128h256v87.531l-118.208-37.995a31.995 31.995 0 0 0-19.584 0L192 215.531V128z"/></svg> Titanic illustrated : column profiles

```r
C <- diag(colSums(P)^(-1)) %*% t(P) - matrix(1, 3, 1) %*% rowSums(P)

svd_C_ext <- svd(diag(colSums(P)^(1/2)) %*% C %*% diag(rowSums(P)^(-1/2)))

Ndual <- diag(colSums(P)^(-1/2)) %*% svd_C_ext$u

Mdual <- diag(rowSums(P)^(1/2)) %*% svd_C_ext$v

G <- Ndual %*% diag(svd_C_ext$d)

res_ca_dual <- data.frame(-G[, 1:2])
names(res_ca_dual) <- c("lambda1", "lambda2")
res_ca_dual$rownames_ <- c("1", "2", "3")
```

]

```r
p %+%
  res_ca_dual +
  ggtitle("Titanic CA column profiles")
```
]

- Passenger Class 3 makes the bulk of passengers (it accounts for more than half of the passengers). The conditional Embarkment distribution for that class is closed to the centroid. Should it be a surprise ?

- Passengers with  Class 2 tickets almost exclusively boarded at Southampton

- Passengers with Class 1 tickets almost exclusively boarded at Southampton and Cherbourg

]
]

---
name: assocmodalities

### Association between modalities

Correspondance analysis is used to compare row profiles and
column profiles

It is also and mainly used
to investigate associations between modalities of the two variables.

Function `CA` from `FactoMineR` delivers among other products
a _biplot for correspondence analysis_.

```r
res_ca <- X %>%
* FactoMineR::CA(ncp = 2, graph = FALSE)
```

|                 |      Dim.1  | Dim.2 |
|:----------------|------------:|------:|
|Variance         |     0.10   | 0.036 |
| `% of var.`     |     74.12  | 25.87|
| Cumulative `% of var.`|  74.12 |100.0 |

---

### Disecting the output of `FactoMineR::CA`

This output, denoted `res_ca`,  is a list with five elements

| Element name | Description                                     |
|:-------------|:------------------------------------------------|
|`eig`         | a redundant matrix for inertia decomposition    |
|`call`        | a list of 9 elements|
|`row`         | a list for row profile analysis |
|`col`         | a list for column profile analysis|
|`svd`         | extended SVD components |

---

### Component `eig` of `res_ca` is made of three columns

- Column 1 `eigenvalue` lists the eigenvalues of `$\ldots$`

- Column 2 is a normalized version of Column 1

- Column 3 is obtained by appling `cumsum` to column 2

???

Component `eig` paves the way for building a `screeplot` for CA

---

### Component `call` is a list

- `X` is the contingency table that was fed to `CA`
- `marge.col` is the marginal distribution for columns
- `marge.row` is the marginal distribution for rows
- `row.w`:  a constant vector `1` here
- `excl`:  irrelevant here
- `Xtot`: irrelevant here
- `N` sum of coefficients of `X`

???

---

### Component `row` of `res_ca`

If the extended SVD of `$X = D_r^{-1} \times P \times D_c^{-1} - \mathbf{1} \times \mathbf{1}^T$` is `$U \times D \times V^T$`

then

`res_ca$row` is `$U \times D = X \times \Phi \times V$`

???

- Compare with PCA output

---
exclude: true

### Component `col`

---

### Component `svd`

- What matrix is this component the SVD of?

- It is the (extended)  SVD  of

`$$D_r^{-1} \times P \times D_c^{-1} - \mathbf{1} \times \mathbf{1}^T$$`

The `$\Omega$` matrix is

`diag(res_ca$call$marge.row)`

The `$\Phi$` matrix is

`diag(res_ca$call$marge.col)`

---
exclude: true

```r
names(res_ca)
```

```
[1] "eig"  "call" "row"  "col"  "svd" 
```

```r
names(res_ca$call)
```

```
[1] "X"         "marge.col" "marge.row" "ncp"       "row.w"     "excl"     
[7] "call"      "Xtot"      "N"        
```

```r
res_ca$row
```

```
$coord
        Dim 1      Dim 2
Q -0.57253082 -0.5151977
C  0.60436922 -0.1645507
S -0.08920676  0.1045260

$contrib
      Dim 1    Dim 2
Q 27.515826 63.82276
C 66.897215 14.20515
S  5.586959 21.97210

$cos2
      Dim 1     Dim 2
Q 0.5525630 0.4474370
C 0.9309859 0.0690141
S 0.4214174 0.5785826

$inertia
[1] 0.05138127 0.07414282 0.01367941
```

```r
res_ca$call$marge.row
```

```
         Q          C          S 
0.08661417 0.18897638 0.72440945 
```

---
exclude: true

### Designing a `tidy` method for outputs of `CA` ?

---

### Inertia analysis

.fl.w-40.pa2[

The inertia of the first dimensions shows if there are strong relationships between variables and suggests the number of dimensions that should be studied.

The first two dimensions express `$100\%$` of the total dataset inertia

That means that `$100\%$` of the rows (or columns) cloud total variability is explained by the plane (_this is trivial!_)
]

.fl.w-60.pa2[

|Rows | `$\text{Iner.} \times 1000$`| Dim.1  | contribution   | `$\cos^2$` |  Dim.2  | contribution   |    `$\cos^2$`  |
|:-|:--------:|----:|----:|----:|----:|----:|----:|
|Q |    51.381 | -0.573 | 27.516 | 0.553 | -0.515 | 63.823 | 0.447 |
|C |    74.143 |  0.604 |66.897 | 0.931 | -0.165 |14.205 | 0.069 |
|S |    13.679 | -0.089 | 5.587 | 0.421 |  0.105 |21.972 | 0.579 |

According to rows analysis, Southampton contributes little to total inertia. Indeed,
Southampton mostly contributes to the centroid.

---

|Rows | `$\text{Iner.} \times 1000$`| Dim.1  | contribution   | `$\cos^2$` |  Dim.2  | contribution   |    `$\cos^2$`  |
|:-|:--------:|----:|----:|----:|----:|----:|----:|
|1 |    77.518 |  0.566 | 74.698 | 0.994 | -0.043 | 1.230 | 0.006 |
|2 |    29.988 | -0.103 | 2.118 | 0.073 |  0.367 | 77.185 |  0.927 |
|3 |    31.697 | -0.208 | 23.184 | 0.755 | -0.119 | 21.585 | 0.245 |

Classes contribute more evenly to Inertia.

---

### Biplots for CA

.fl.w-40[

Passengers who boarded at Cherbourg were disproportionately first class passengers

Passengers who boarded at Queenstown were disproportionately third class passengers

Note that this biplot does not displays the weight of the different categories

]

.fl.w-60.pa2[

]

---
exclude: true

### Questionnaire

```r
require(FactoMineR)
data(children)
res.ca <- CA (children, row.sup = 15:18, col.sup = 6:8)
```

---
template: inter-slide
name: factoshowcase

## Another FactoMineR showcase

---

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"/></svg> Women at work (1974)

.fl.w-50[
In a 1974 survey, 1724 women  answered several questions about women'work among which:

- What do you think the perfect family is ?
  - Both husband and wife work
  - Husband works more than wife
  - Only husband works
- Which activity is the best for a mother when children go to school?
  - Stay at home
  - Part-time work
  - Full-time work
]

.fl.w-50[

![](./img/arts_menagers.jpeg)

]

```r
facto_url <- "http://factominer.free.fr"
url <- paste(facto_url,
             "factomethods",
             "datasets",
             "women_work.txt", sep = "/")
women_work <- read.table(url,
                      header=TRUE,
                      row.names=1,
                      sep="\t")

tab <- women_work[, 1:3]

head(tab)
```

```
                        stay.at.home part.time.work full.time.work
both.man.and.woman.work           13            142            106
man.morks.more                    30            408            117
only.man.works                   241            573             94
```
]

```r
library(FactoMineR)
res_ca <- CA(tab)
```

```r
res_ca$eig %>% knitr::kable()
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> eigenvalue </th>
   <th style="text-align:right;"> percentage of variance </th>
   <th style="text-align:right;"> cumulative percentage of variance </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> dim 1 </td>
   <td style="text-align:right;"> 0.1168400 </td>
   <td style="text-align:right;"> 86.29218 </td>
   <td style="text-align:right;"> 86.29218 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> dim 2 </td>
   <td style="text-align:right;"> 0.0185604 </td>
   <td style="text-align:right;"> 13.70782 </td>
   <td style="text-align:right;"> 100.00000 </td>
  </tr>
</tbody>
</table>

```r
res_ca$row$coord %>% knitr::kable()
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Dim 1 </th>
   <th style="text-align:right;"> Dim 2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> both.man.and.woman.work </td>
   <td style="text-align:right;"> -0.5586051 </td>
   <td style="text-align:right;"> 0.2333870 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> man.morks.more </td>
   <td style="text-align:right;"> -0.2437595 </td>
   <td style="text-align:right;"> -0.1722066 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> only.man.works </td>
   <td style="text-align:right;"> 0.3095622 </td>
   <td style="text-align:right;"> 0.0381726 </td>
  </tr>
</tbody>
</table>
]

![](cm-7-EDA_files/figure-html/ca_women-1.png)
]
]

---

Another question was asked th 1724 women

- What do you think of the following sentence: _women who do not work feel cut off from the world_?
  + Totally agree  <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg>
  + Quite agree <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm-80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm176 192H152c-21.2 0-21.2-32 0-32h192c21.2 0 21.2 32 0 32zm-16-128c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"/></svg>
  + Quite disagree <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm170.2 218.2C315.8 367.4 282.9 352 248 352s-67.8 15.4-90.2 42.2c-13.5 16.3-38.1-4.2-24.6-20.5C161.7 339.6 203.6 320 248 320s86.3 19.6 114.7 53.8c13.6 16.2-11 36.7-24.5 20.4z"/></svg>
  + Totally disagree <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zM136 240c0-9.3 4.1-17.5 10.5-23.4l-31-9.3c-8.5-2.5-13.3-11.5-10.7-19.9 2.5-8.5 11.4-13.2 19.9-10.7l80 24c8.5 2.5 13.3 11.5 10.7 19.9-2.1 6.9-8.4 11.4-15.3 11.4-.5 0-1.1-.2-1.7-.2.7 2.7 1.7 5.3 1.7 8.2 0 17.7-14.3 32-32 32S136 257.7 136 240zm168 154.2c-27.8-33.4-84.2-33.4-112.1 0-13.5 16.3-38.2-4.2-24.6-20.5 20-24 49.4-37.8 80.6-37.8s60.6 13.8 80.6 37.8c13.8 16.5-11.1 36.6-24.5 20.5zm76.6-186.9l-31 9.3c6.3 5.8 10.5 14.1 10.5 23.4 0 17.7-14.3 32-32 32s-32-14.3-32-32c0-2.9.9-5.6 1.7-8.2-.6.1-1.1.2-1.7.2-6.9 0-13.2-4.5-15.3-11.4-2.5-8.5 2.3-17.4 10.7-19.9l80-24c8.4-2.5 17.4 2.3 19.9 10.7 2.5 8.5-2.3 17.4-10.8 19.9z"/></svg>

]

```r
pattern_ <- "housewives.cut.from.world."
women_work <- women_work %>%
  rename_with(.fn= ~ str_replace(., pattern_, ""),
              starts_with(pattern_))

head(women_work[, 4:ncol(women_work)])
```

```
                        totally.agree quite.agree quite.disagree
both.man.and.woman.work           107          75             40
man.morks.more                    192         175            100
only.man.works                    140         215            254
                        totally.disagree
both.man.and.woman.work               39
man.morks.more                        88
only.man.works                       299
```
]

```r
res_ca <- CA(women_work, col.sup=4:ncol(women_work))
```

```r
res_ca$row$coord %>% knitr::kable()
```

![](cm-7-EDA_files/figure-html/ca_women_sup-1.png)
]

> "Totally agree" and "Quite agree" for "Women who do not work feel cut off from the world" are close to categories in favour of women's work.

> "Quite disagree" and "Totally "disagree" are close to categories opposed to women's work.

]

---
exclude: true

## Bibliographic remarks {#bibca}

@MR767260 is a lively and thorough presentation of correspondence
analysis and related techniques.

@holmes is a brief and spirited introduction to the duality diagrams framework that underlies package `ade4`.

@husson2018r explains how to use `FactoMineR` to perform Correspondence Analysis.

---

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: cover

# The End