Multilinear Regression II: Algorithm(s)

---
name: layout-general
layout: true
class: left, middle

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/Universite_Paris_logo_horizontal.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
template: inter-slide

# Multilinear Regression II: Formulae and Algorithm(s)

#### [Master I MIDS & EDA]()

#### [Analyse Exploratoire de Données](http://stephane-v-boucheron.fr/courses/eda/)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
template: inter-slide

## <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 117.66v346.32c0 11.32 11.43 19.06 21.94 14.86L160 416V32L20.12 87.95A32.006 32.006 0 0 0 0 117.66zM192 416l192 64V96L192 32v384zM554.06 33.16L416 96v384l139.88-55.95A31.996 31.996 0 0 0 576 394.34V48.02c0-11.32-11.43-19.06-21.94-14.86z"/></svg>

### Motivations

### Ordinary Least Squares

### Closed-form Formulae

### `lm`  and `tidyverse`

### QR Factorization

### OLS and Pseudo-inversion

---
template: inter-slide
name: motivations

## Motivations

???

---

### Principled description of OLS

The approach to (multiple) linear regression we are following is called
Ordinary Least Squares (OLS)

Multiple Linear Regression consists of

- Picking a dataframe with `n` rows,  one _response_ variable `Y`, and _explanatory_ variables (also named _covariates_)

- Use the (possibly categorical) covariates columns to build a _design matrix_ `Z` with `n` rows and `p` columns according to a _formula_ `Y ~ ...`

- State the Least Square Problem : find `$\beta \in \mathbb{R}^p$` that minimizes

`$$\left\| Y - Z \times \beta \right\|^2$$`

Call the (possibly chosen) solution `$\widehat{\beta}$`

- Run _diagnostics_
  + check whether Good Of Fit criteria are trustable
  + spot outliers

---

### Convention

- `$\mathcal{M}_{n,p}$` denotes the set of real matrices with `$n$` rows and `$p$` columns

- Assumption `$n > p$` (classical regime)

- Vectors are assumed to be column vectors (matrices with 1 column)

- If `$A$`  is a matrix, `$A^T$` is the transpose of `$A$`

- The design matrix `$Z$` is in  `$\mathcal{M}_{n,p}$`

- The response vector `$Y$` is in `$\mathbb{R}^n$` (equivalently `$\mathcal{M}_{n,1}$`)

- The parameter space is `$\mathbb{R}^p$` (equivalently `$\mathcal{M}_{p,1}$`)

---

### Ordinary Least Squares

Ideally we would like to find `$\beta \in \mathbb{R}^p$` such that

`$$Y =  Z \times \beta$$`

This system of linear equations is usually not solvable

Instead we look for `$\beta$` that minimizes the Least Square criterion

`$$\left\| Y - Z \times \beta \right\|^2$$`

---
template: inter-slide
name: closed-form-formulae

## Closed-form formulae

---

### Solving the OLS problem

- Geometric approach

- Analytic approach

???

---

### Geometric approach

- `$\mathcal{L}(Z)$` linear subspace of `$\mathbb{R}^n$` generated by the columns of `$Z$`

- `$\Pi$` : `$n \times n$` matrix associated with orthogonal projection of `$\mathbb{R}^n$` on `$\mathcal{L}(Z)$`

- `$\widehat{Y} = \Pi \times Y$` projection of `$Y$` on `$\mathcal{L}(Z)$`

- `$\widehat{\epsilon} = Y - \widehat{Y} = (\mathrm{Id} - \Pi)\times Y$` projection of `$Y$` on on `$\mathcal{L}(Z)^\bot$`

???

Assume for a while that we already know how to compute `$\widehat{Y}$`

---

### Pythagorean formulae

`$$\begin{array}{rl}\left\| Y - Z \beta \right\|^2
& = \left\| Y - \widehat{Y} + \left(\widehat{Y} - Z \beta\right) \right\|^2\\
& = \left\| Y - \widehat{Y}\right\|^2 + 2 \left\langle Y - \widehat{Y}, \widehat{Y} - Z \beta \right\rangle + \left\|\widehat{Y} - Z \beta \right\|^2\\
& = \left\| Y - \widehat{Y}\right\|^2 + \left\|\widehat{Y} - Z \beta \right\|^2\end{array}$$`

`$$Y - \widehat{Y} \in \mathcal{L}(Z)^\bot \quad\text{while}\quad \widehat{Y} - Z \beta \in \mathcal{L}(Z)$$`

---

### Exploiting the Pythagorean formula

`$$\left\| Y - Z \beta \right\|^2 = \underbrace{\left\| Y - \widehat{Y}\right\|^2}_{\text{depends on }Y, Z} + \underbrace{\left\|\widehat{Y} - Z \beta \right\|^2}_{\text{to be optimized}}$$`

OLS boils down to find some `$\beta \in \mathbb{R}^p$`  that solves

`$$\widehat{Y} = Z \beta$$`

Two cases:

- The columns of `$Z$` are linearly independent:  `$Z$` has full column rank

- The columns of `$Z$` are _not_ linearly independent

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5[

Matrix `$Z \in \mathcal{M}_{n,p}$` has full column rank `$p$`

iff

`$Z^T \times Z \in \mathcal{M}_{p,p}$` is invertible

]

---

### Proof

`$(\Rightarrow)$` Let `$Z$` have full column rank.

Assume `$Z^T \times Z$` is not invertible (proof by contradiction)

Then there exists a _non-null_ vector `$u \in \mathbb{R}^p$` such that

`$(Z^T \times Z) \times u = 0$` with `$0 \in \mathbb{R}^p$`

Thus

`$0 = \langle u , 0\rangle = u^T \times (Z^T \times Z) \times u = \langle Z\times u, Z \times u\rangle = \left\| Z \times u\right\|^2$`

which implies `$Z \times u= 0$`

If `$Z$` has full-column rank, `$Z \times u = 0$` implies `$u=0$`, a contradiction

---

### Proof (continued)

`$(\Leftarrow)$` Let `$Z \in \mathcal{M}_{n,p}$` have column rank `$< p$`

There exists `$u \in \mathbb{R}^p \setminus \{0\}$` such that `$Z \times u = 0 \in \mathbb{R}^n$`

`$Z^T \times Z \times u = Z^T \times 0 =  0 \in \mathbb{R}^p$`

which implies that that `$Z^T \times Z$` is not invertible

---

###  OLS solution

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5[

`$$\begin{array}{rl}
\widehat{Y} =  Z \times \beta
& \Rightarrow Z^T \times \widehat{Y} = \left(Z^T \times Z\right) \times \beta\\
& \Leftrightarrow \left(Z^T \times Z\right)^{-1}\times Z^T \times \widehat{Y} = \beta
\end{array}$$`

`$$\widehat{\beta} = \left(Z^T \times Z\right)^{-1}\times Z^T \times \widehat{Y}$$`

]

---

### Computing the projection

We keep assuming `$Z$` has full column rank ( `$p$` )

### Definition: Hat matrix

`$$H = Z \times \left(Z^T \times Z\right)^{-1} \times Z^T$$`

`$H \in \mathcal{M}_{n,n}$`

???

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5[

Assumption: `$Z$` has full column rank

The Hat matrix `$H = Z \times \left(Z^T \times Z\right)^{-1} \times Z^T$` coincides with the matrix `$\Pi$` associated with the orthogonal projection of `$\mathbb{R}^n$` on the linear subspace `$\mathcal{L}(Z)$` generated by the  columns of `$Z$`

]

---

### Proof

It is enough to check that

1. if `$u \in \mathcal{L}(Z)^\bot$` `$H \times u =0$`
1. if `$u \in \mathcal{L}(Z) \subseteq \mathbb{R}^n$` `$H \times u =u$`

1.) Assume `$u \in \mathcal{L}(Z)^\bot$`, then `$u$` is orthogonal to any column of `$Z$`, or equivalently to any row of `$Z^T$`, which implies `$Z^T \times u = 0 \in \mathbb{R}^p$` and `$H \times u =  Z \times \left(Z^T \times Z\right)^{-1} \times Z^T u =0$`

---

### Proof (continued)

2.) Assume `$u \in \mathcal{L}(Z)$`

Note that `$H \times u$` is a linear combination of the columns of `$Z$`, hence `$v = H \times u \in \mathcal{L}(Z)$`

Observe `$Z^T\times v = Z^T \times  H \times u =Z \times \left(Z^T \times Z\right)^{-1} \times Z^T u  = Z^T \times u$`

Hence `$Z^T \times (v- u) = 0 \in \mathbb{R}^p$`

`$v-u \in \mathcal{L}(Z)$` and `$v-u$` is orthogonal to any column of `$Z$` (row of `$Z^T$` )

This shows `$v - u = 0 \in \mathbb{R}^n$`.

Hence `$H \times u = u$` for `$u \in \mathcal{L}(Z)$`

---

### Using the Hat matrix

`$$\widehat{Y} = H \times Y =   Z\times \left(Z^T \times Z\right)^{-1} \times Z^T \times Y= Z \times \widehat{\beta}$$`

`$$\widehat{\epsilon} =  Y - Z \times \widehat{\beta}$$`

---

### Analytic approach

`$$\beta \to f(\beta) = \left\| Y - Z \times \beta \right\|^2$$`

is a smooth convex function of `$\beta$`

The smooth convex function `$f$` achieves its minimum where its gradient  `$\nabla f$` vanishes

`$$f(\beta) = \left\langle Y - Z \times \beta , Y - Z \times \beta \right\rangle= \left\| Y\right\| -2 \left\langle Z^T \times Y,\beta \right\rangle + \left\langle Z \times \beta, Z\times \beta\right\rangle$$`

The gradient is

`$$\nabla f = -2 Z^T \times Y + 2  \times Z^T \times Z\times \beta$$`

The gradient vanishes for

`$$\beta = \left( Z^T \times Z \right)^{-1} \times Z^T \times Y$$`

---
template: inter-slide

## OLS and `lm` objects

---

```r
*anscombe <- datasets::anscombe
```
]
 
.panel2-tidy_anscombe-auto[

]

---
count: false
 
### Tidy Anscombe dataset
.panel1-tidy_anscombe-auto[