Probability VIII: Characterizing Probability Distributions

name: inter-slide
class: left, middle, inverse

---
name: layout-general
layout: true
class: left, middle

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/Universite_Paris_logo_horizontal.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
template: inter-slide

# Characterizing Probability Distributions

### 2021-10-28

#### [Probabilités Master I MIDS](http://stephane-v-boucheron.fr/courses/probability/)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
template: inter-slide
name: roadmap

## [Characteristic functions](#characteristic-functions)

## [Convolutions](#convolutions)

## [Gaussian characteristic functions](#secgaussiancharac)

## [Inversion Theorem](#sectioninversionformula)

## [Quantile functions](#secquantiles)

---
template: inter-slide

## Motivation

# <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm80 256c-60.6 0-134.5-38.3-143.8-93.3-2-11.8 9.3-21.6 20.7-17.9C155.1 330.5 200 336 248 336s92.9-5.5 123.1-15.2c11.3-3.7 22.6 6.1 20.7 17.9-9.3 55-83.2 93.3-143.8 93.3z"/></svg>

---

### Why do we look for _characterizations_ of probability distributions?

- Checking equality of distributions

- Quantifiying difference between distributions

- Characterizing limiting behaviour

- ...

???

---

## <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:gifts;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm80 256c-60.6 0-134.5-38.3-143.8-93.3-2-11.8 9.3-21.6 20.7-17.9C155.1 330.5 200 336 248 336s92.9-5.5 123.1-15.2c11.3-3.7 22.6 6.1 20.7 17.9-9.3 55-83.2 93.3-143.8 93.3z"/></svg>

- Cumulative distribution function (tail function)

- Probability density (if any)

- Probability generating function  (on `$\mathbb{N}, \mathbb{Z}$`)

- Fourier transform/Characteristic function

- Laplace transform ( `$[0, \infty)$` -valued)

---
exclude: true
class: inverse, middle, center

## <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M201.5 174.8l55.7 55.8c3.1 3.1 3.1 8.2 0 11.3l-11.3 11.3c-3.1 3.1-8.2 3.1-11.3 0l-55.7-55.8-45.3 45.3 55.8 55.8c3.1 3.1 3.1 8.2 0 11.3l-11.3 11.3c-3.1 3.1-8.2 3.1-11.3 0L111 265.2l-26.4 26.4c-17.3 17.3-25.6 41.1-23 65.4l7.1 63.6L2.3 487c-3.1 3.1-3.1 8.2 0 11.3l11.3 11.3c3.1 3.1 8.2 3.1 11.3 0l66.3-66.3 63.6 7.1c23.9 2.6 47.9-5.4 65.4-23l181.9-181.9-135.7-135.7-64.9 65zm308.2-93.3L430.5 2.3c-3.1-3.1-8.2-3.1-11.3 0l-11.3 11.3c-3.1 3.1-3.1 8.2 0 11.3l28.3 28.3-45.3 45.3-56.6-56.6-17-17c-3.1-3.1-8.2-3.1-11.3 0l-33.9 33.9c-3.1 3.1-3.1 8.2 0 11.3l17 17L424.8 223l17 17c3.1 3.1 8.2 3.1 11.3 0l33.9-34c3.1-3.1 3.1-8.2 0-11.3l-73.5-73.5 45.3-45.3 28.3 28.3c3.1 3.1 8.2 3.1 11.3 0l11.3-11.3c3.1-3.2 3.1-8.2 0-11.4z"/></svg> Probability Generating Functions

---
name: characteristic-functions
template: inter-slide

## Characteristic functions

---

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Definition: Characteristic functions

- `$X \sim P$`, `$\mathbb{R}$` valued

- Cumulative distribution function `$F$`: `$F(x) = P\{X \leq x \}$`

The _characteristic function_ `$\widehat{F}$` of distribution `$P$` is defined by

`$$\begin{array}{rl}
\widehat{F} \quad :  & \quad \mathbb{R}  \to \mathbb{C} \\
\widehat{F}(t) & = \mathbb{E}\left[\mathrm{e}^{i t X}\right] \\
  & = \int_{\mathbb{R}} \mathrm{e}^{i t x} \mathrm{d}F(x) \\
  & =	\int_{\mathbb{R}} \cos(t x) \mathrm{d}F(x) + i \int_{\mathbb{R}} \sin(t x) \mathrm{d}F(x)
\end{array}$$`

]

- Defined for all probability distributions over `$\mathbb{R}$`

- The definition can be extended to distributions on `$\mathbb{R}^k, k\geq 1$`

---
exclude: true
### Characteristic functions of Absolutely Continuous distributions

---

### Properties fo characteristic functions

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

Let `$X \sim P$` with characteristic function `$\widehat{F}$`

- `$\widehat{F}$` is (uniformly) continuous over `$\mathbb{R}$`

- `$\widehat{F}(0)=1$`

- If `$X$` is _symmetric_, `$\widehat{F}$` is real-valued

- The characteristic function of the distribution of `$a X +b$` is `$$\mathrm{e}^{it b} \widehat{F}(at)$$`

]

---

### Proof (Checking  the continuity property)

Calculus leads to

`$$\begin{array}{rl}
\Big| \mathrm{e}^{i(t+ \delta)x} - \mathrm{e}^{itx}\Big|
  & = \Big| \mathrm{e}^{itx}\Big| \times \Big|\mathrm{e}^{i\delta x} - 1\Big|
  & \leq \Big|\mathrm{e}^{i\delta x} - 1\Big| \\
  & \leq 2 \Big( 1 \wedge \big| \delta x \big| \Big)
\end{array}$$`

for every `$t\in \mathbb{R}, \delta \in \mathbb{R}, x \in \mathbb{R}$`.

Integration with respect to `$F$`:

`$$\begin{array}{rl}
  \Big| \widehat{F}(t+\delta) - \widehat{F}(t)\Big|
  & \leq \int 2 \Big( 1 \wedge \big| \delta x \big| \Big) \mathrm{d}F(x) \,.
\end{array}$$`

By the dominated convergence theorem:

`$$\lim_{\delta \to 0} \Big| \widehat{F}(t+\delta) - \widehat{F}(t)\Big| = 0\qquad \text{uniformly in } t$$`

---

### Characteristic functions of sums of independent random variables

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

Let

- `$X \perp\!\!\!\perp Y$`

- with cumulative distribution functions `$F_X$` and `$F_Y$`

Then

`$$\widehat{F}_{X+Y}(t) =  \widehat{F}_X(t) \times \widehat{F}_Y(t)\qquad \forall t \in \mathbb{R}$$`

]

---

### Proof

`$$\begin{array}{rl}
\widehat{F}_{X+Y}(t)
  & = \mathbb{E}\Big[\mathrm{e}^{it (X+Y)}\big] \\
  & = \mathbb{E}\Big[\mathrm{e}^{it X} \mathrm{e}^{it Y}\big] \\
  & = \mathbb{E}\Big[\mathrm{e}^{it X} \big] \times \mathbb{E}\big[\mathrm{e}^{it Y}\big] \qquad \text{independance}\\
  & =  \widehat{F}_X(t) \times \widehat{F}_Y(t) \, .
\end{array}$$`

---

### Looking for a converse ?

`$$\Big(\forall t \in \mathbb{R}, \quad \widehat{F}_{X+Y}(t) =  \widehat{F}_X(t) \times \widehat{F}_Y(t) \Big) \not\Rightarrow X \perp\!\!\!\perp Y$$`

---
name: secgaussiancharac
template: inter-slide

## Characteristic function of a univariate Gaussian distribution

---

It is possible to compute characteristic functions by resorting to Complex Analysis.

We shall  refrain from this when computing the most important characteristic function:
the characteristic function of the standard Gaussian distribution.

---

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

Let `$\widehat{\Phi}$` denote the characteristic function of the standard univariate Gaussian distribution `$\mathcal{N}(0,1)$`,

The following holds

`$$\widehat{\Phi}(t) = \mathrm{e}^{-\frac{t^2}{2}}$$`

]
---

### Proof

As `$\phi$` is even, `$\widehat{\Phi}(t) =\mathbb{E} \cos(tX)$`

`$$\begin{array}{rl}
\widehat{\Phi}'(t)
& = - \mathbb{E}\left[X \sin(t X) \right] \\
& =  - \frac{1}{\sqrt{2 \pi}}\int_{\mathbb{R}} x \sin(tx) \mathrm{e}^{-\frac{x^2}{2}} \mathrm{d}x \\
& = \frac{1}{\sqrt{2 \pi}} \Big[\sin(tx) \mathrm{e}^{-\frac{x^2}{2}} \Big]_{-\infty}^{\infty} - t \frac{1}{\sqrt{2 \pi}}\int_{\mathbb{R}}  \cos(tx) \mathrm{e}^{-\frac{x^2}{2}} \mathrm{d}x \\
& = - t \widehat{\Phi}(t) \,.
\end{array}$$`

`$\widehat{F}$` is a solution of the differential equation:

`$$g'(t) = -t g(t) \quad \text{ with } g(0)=1$$`

The differential equation is readily solved, and the solution is

`$$g(t)= \mathrm{e}^{- \frac{t^2}{2}}$$`

---

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M519.442 288.651c-41.519 0-59.5 31.593-82.058 31.593C377.409 320.244 432 144 432 144s-196.288 80-196.288-3.297c0-35.827 36.288-46.25 36.288-85.985C272 19.216 243.885 0 210.539 0c-34.654 0-66.366 18.891-66.366 56.346 0 41.364 31.711 59.277 31.711 81.75C175.885 207.719 0 166.758 0 166.758v333.237s178.635 41.047 178.635-28.662c0-22.473-40-40.107-40-81.471 0-37.456 29.25-56.346 63.577-56.346 33.673 0 61.788 19.216 61.788 54.717 0 39.735-36.288 50.158-36.288 85.985 0 60.803 129.675 25.73 181.23 25.73 0 0-34.725-120.101 25.827-120.101 35.962 0 46.423 36.152 86.308 36.152C556.712 416 576 387.99 576 354.443c0-34.199-18.962-65.792-56.558-65.792z"/></svg>

- Why is `$\widehat{\Phi}$` differentiable?

- Why are we allowed to interchange expectation and derivation?

---

### <svg aria-hidden="true" role="img" viewBox="0 0 192 512" style="height:1em;width:0.38em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M176 432c0 44.112-35.888 80-80 80s-80-35.888-80-80 35.888-80 80-80 80 35.888 80 80zM25.26 25.199l13.6 272C39.499 309.972 50.041 320 62.83 320h66.34c12.789 0 23.331-10.028 23.97-22.801l13.6-272C167.425 11.49 156.496 0 142.77 0H49.23C35.504 0 24.575 11.49 25.26 25.199z"/></svg>

A byproduct of Proposition is the following integral representation of the Gaussian density.

`$$\phi(x)  =  \frac{1}{2 \pi} \int_{\mathbb{R}} \widehat{\Phi}(t) \mathrm{e}^{-itx} \mathrm{d}t$$`

It does not look interesting, but it is a milestone for the derivation of the _general inversion formula_

---
name: convolutions
template: inter-slide

## Sums of independent random variables and convolutions

---

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M201.5 174.8l55.7 55.8c3.1 3.1 3.1 8.2 0 11.3l-11.3 11.3c-3.1 3.1-8.2 3.1-11.3 0l-55.7-55.8-45.3 45.3 55.8 55.8c3.1 3.1 3.1 8.2 0 11.3l-11.3 11.3c-3.1 3.1-8.2 3.1-11.3 0L111 265.2l-26.4 26.4c-17.3 17.3-25.6 41.1-23 65.4l7.1 63.6L2.3 487c-3.1 3.1-3.1 8.2 0 11.3l11.3 11.3c3.1 3.1 8.2 3.1 11.3 0l66.3-66.3 63.6 7.1c23.9 2.6 47.9-5.4 65.4-23l181.9-181.9-135.7-135.7-64.9 65zm308.2-93.3L430.5 2.3c-3.1-3.1-8.2-3.1-11.3 0l-11.3 11.3c-3.1 3.1-3.1 8.2 0 11.3l28.3 28.3-45.3 45.3-56.6-56.6-17-17c-3.1-3.1-8.2-3.1-11.3 0l-33.9 33.9c-3.1 3.1-3.1 8.2 0 11.3l17 17L424.8 223l17 17c3.1 3.1 8.2 3.1 11.3 0l33.9-34c3.1-3.1 3.1-8.2 0-11.3l-73.5-73.5 45.3-45.3 28.3 28.3c3.1 3.1 8.2 3.1 11.3 0l11.3-11.3c3.1-3.2 3.1-8.2 0-11.4z"/></svg>

If `$f$` and `$g$` are two integrable functions, the _convolution_  of `$f$` and `$g$` is defined as

`$$f \star g (x)  = \int_{\mathbb{R}} f(x-y)g(y) \mathrm{d}y = \int_{\mathbb{R}} g(x-y)f(y) \mathrm{d}y$$`

- `$f \star g$` is also integrable

- if `$f$` and `$g$`  are two probability densities then so is `$f \star g$`,

- `$f \star g$` is the density of the distribution of `$X+Y$` where `$X \sim f$` is independent from  `$Y \sim g$`.

???

The interplay between Characteristic functions/Fourier transforms and
summation of independent random variables is
one of the most attractive features of this transformation.

In order to understand it, we shall need an operation stemming from analysis: _convolution_

---
name:convdensity

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition  (Regularization by convolution)

Let `$X \sim  P_X, Y \sim P_Y$`, with `$X \perp\!\!\!\perp Y$`

Assume that `$P_X$` is absolutely continuous with density `$p_X$`.

Then the distribution of `$X+Y$` is absolutely continuous and has density

`$$p_x \star P_Y (z) = \int_{\mathbb{R}} p_X(z -y ) \mathrm{d}P_Y(y)$$`

]

---

### Proof

Let `$B$` be Borel subset of `$\mathbb{R}$`  ( `$B \in \mathcal{B}(\mathbb{R})$` )

`$$\begin{array}{rl}
P \Big\{ X+Y  \in B\Big\}
  & = \int_{\mathbb{R}} \Big( \int_{\mathbb{R}} \mathbb{I}_B(x+y) p_X(x)\mathrm{d}x\Big) \mathrm{d}P_Y(y) \\
  & = \int_{\mathbb{R}} \Big(\int_{\mathbb{R}} \mathbb{I}_B(z) p_X(z-y)\mathrm{d}z\Big) \mathrm{d}P_Y(y) \\
  & = \int_{\mathbb{R}} \mathbb{I}_B(z) \Big(\int_{\mathbb{R}}  p_X(z-y) \mathrm{d}P_Y(y) \Big) \mathrm{d}z \\
  & = \int_{\mathbb{R}} \mathbb{I}_B(z) p_x \star P_Y (z) \mathrm{d}z
\end{array}$$`

where

- the first equality follows from the Tonelli-Fubini Theorem,
- the second equality is obtained by change of variable `$x \mapsto z = x+y$` for every `$y$`,
- the third equality follows again from the Tonelli-Fubini Theorem.

???

Convolution is not tied to Probability theory.

- In Analysis, convolution is known to be a regularizing
(smoothing) operation. This also holds in Probability theory: if the distribution of either `$X$` or `$Y$` has a density and `$X \perp\!\!\!\perp Y$`, then the distribution of
`$X+Y$` has a density.

- Convolution with smooth distributions plays an important role in non-parametric statsitics, it is at the root of kernel density estimation.

- Convolution is an important tool in Signal Processing.

---

- Check that if `$X$` and `$Y$` are independent with densities `$f_X$` and `$f_Y$`,
`$f_X \star f_Y$`  is a density of the distribution of `$X+Y$`.

- If `$Y =0$` almost surely (its distribution is `$\delta_0$`), then  `$p_X \star \delta_0 = p_X$`.

- What happens in previous proposition, if we consider the distributions of `$\sigma X +Y$`  and let `$\sigma$` decrease to `$0$`?

---

Vanishing perturbations

### Proposition

Assume

- `$P_X \trianglelefteq \text{Lebesgue}$` with density `$p_X$`

- `$P_X(-\infty, 0] = \alpha \in (0,1)$`

Then

`$$\lim_{\sigma \downarrow 0} \mathbb{P}\big\{ Y + \sigma X \leq a \Big\} = P_Y(-\infty, a) + \alpha P_Y\{a\}$$`

---

### Proof

`$$\begin{array}{rl}
\mathbb{P}\big\{ Y + \sigma X \leq a \Big\}
    & = \int_{\mathbb{R}} \int_{\mathbb{R}} \mathbb{I}_{x \leq  \frac{a-y}{\sigma}} p_X(x) \mathrm{d}x \mathrm{d}P_Y(y) \\
    & = \int_{(-\infty,a)} \int_{\mathbb{R}} \mathbb{I}_{x \leq  \frac{a-y}{\sigma}} p_X(x) \mathrm{d}x \mathrm{d}P_Y(y) \\
    & + \int_{\mathbb{R}} \mathbb{I}_{x \leq  \frac{a-a}{\sigma}} p_X(x) \mathrm{d}x P_Y\{a\} \\
    & + \int_{(a, \infty)} \int_{\mathbb{R}} \mathbb{I}_{x \leq  \frac{a-y}{\sigma}} p_X(x) \mathrm{d}x \mathrm{d}P_Y(y)
\end{array}$$`

By monotone convergence,

- the first and third integrals converge respectively to  `$P_Y(-\infty, a)$`  and `$0$`

- the second term equals `$\alpha P_Y\{a\}$`.

---
name: sectioninversionformula
template: inter-slide

## Inversion formula

---
name: injectivitytheorem

The characteristic function maps probability measures to `$\mathbb{C}$`-valued functions.

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Injectivity Theorem

If two probability distribution `$P$` and `$Q$` have the same characteristic function, they are equal.

]

???

The injectivity property follows from an explicit inversion recipe.

The characteristic function
allows us to recover the cumulative distribution function at all its continuity points

Again, as continuity points of
cumulative distribution functions are dense on `$\mathbb{R}$`, this is enough.

---
name:reginversion

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition (Inversion formula for regularized RV)

Let `$X \sim F$` and `$Z \sim \mathcal{N}(0,1)$` with `$X ⟂\!\!\!⟂ Z$`

Let `$Y = X + \sigma Z$`, then:

- the distribution of `$Y$` has characteristic function
`$$\widehat{F}_\sigma(t) = \widehat{\Phi}(t\sigma) \times \widehat{F}(t) = \mathrm{e}^{- \frac{t^2 \sigma^2}{2}} \widehat{F}(t)$$`

- the distribution of `$Y$` is absolutely continuous with respect to Lebesgue measure

- a version of the  density of the distribution of `$Y$` is given by
`$$\frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}} \widehat{F}(t)\mathrm{e}^{-ity} \mathrm{d}t
  = \frac{1}{{2 \pi}}\int_{\mathbb{R}}  \widehat{F}_\sigma(t)\mathrm{e}^{-ity} \mathrm{d}t$$`

]

---

---

### Proof

The fact that for any `$\sigma >0$`, the distribution of `$Y = X + \sigma Z$` is absolutely continuous with respect to Lebesgue measure comes from [Regularization by convolution Proposition](#convdensity)

A density of the distribution of `$X + \sigma Z$` is given by

`$$y \mapsto \int_{\mathbb{R}} \frac{1}{\sigma} \phi\Big(\frac{y -x}{\sigma}\Big) \mathrm{d}F(x)$$`

The characteristic function of `$Y$`  at `$t$` is `$\mathrm{e}^{- \frac{t^2 \sigma^2}{2}} \widehat{F}(t)$`.

---

### Proof (continued)

`$$\begin{array}{rl}
\mathbb{P}\Big\{ Y \leq u\Big\}
& = \int_{-\infty}^u \int_{\mathbb{R}} \frac{1}{\sigma} \phi\Big(\frac{y -x}{\sigma}\Big) \mathrm{d}F(x)  \mathrm{d}y \\
& = \int_{-\infty}^u \int_{\mathbb{R}} \frac{1}{\sigma}
\frac{1}{\sqrt{2 \pi}} \int_{\mathbb{R}} \frac{\mathrm{e}^{- \frac{t^2}{2}}}{\sqrt{2\pi}} \mathrm{e}^{-it 	\frac{y-x}{\sigma}} \mathrm{d}t
\mathrm{d}F(x)  \mathrm{d}y \\
& = \int_{-\infty}^u \int_{\mathbb{R}} \frac{1}{\sigma}
\frac{1}{{2 \pi}} \mathrm{e}^{- \frac{t^2}{2}} \mathrm{e}^{-\frac{ity}{\sigma}} \int_{\mathbb{R}}  \mathrm{e}^{\frac{itx}{\sigma}} \mathrm{d}F(x)  \mathrm{d}t
 \mathrm{d}y \\
& = \int_{-\infty}^u \int_{\mathbb{R}} \frac{1}{\sigma}
\frac{1}{{2 \pi}} \mathrm{e}^{- \frac{t^2}{2}} \mathrm{e}^{-\frac{ity}{\sigma}} \widehat{F}(t/\sigma)  \mathrm{d}t
 \mathrm{d}y \\
& = \int_{-\infty}^u  \left( \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t   \right) \mathrm{d}y
\end{array}$$`

The quantity `$\frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t$` is a version of the density of the distribution of `$Y = X + \sigma Z$`  (why?).

???

Note that it is obtained
from the same inversion formula that readily worked for the Gaussian density.

---
name: generalInvFormula

Now we have to show that an inversion formula works for all probability distributions,
not only for the smooth probability distributions obtained by adding Gaussian noise.

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

Let `$X \sim P$`,  with cumulative distribution function `$F$` and characteristic function `$\widehat{F}$`.

Then:

`$$\lim_{\sigma \downarrow 0} \int_{-\infty}^u  \left( \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{-ity}  \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\widehat{F}(t)\mathrm{d}t   \right) \mathrm{d}y =  F(u_-) + \frac{1}{2} P\{u\}$$`

where

`$$F(u_-) = \lim_{v \uparrow u} F(v) = P(-\infty, u)$$`

]

???

---

### Proof

The proof consists in combining  [Proposition](#approxident) and  [Inversion formula for regularized RV](#reginversion)

---

### <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M639.4 433.6c-8.4-20.4-31.8-30.1-52.2-21.6l-22.1 9.2-38.7-101.9c47.9-35 64.8-100.3 34.5-152.8L474.3 16c-8-13.9-25.1-19.7-40-13.6L320 49.8 205.7 2.4c-14.9-6.2-32-.3-40 13.6L79.1 166.5C48.9 219 65.7 284.3 113.6 319.2L74.9 421.1l-22.1-9.2c-20.4-8.5-43.7 1.2-52.2 21.6-1.7 4.1.2 8.8 4.3 10.5l162.3 67.4c4.1 1.7 8.7-.2 10.4-4.3 8.4-20.4-1.2-43.8-21.6-52.3l-22.1-9.2L173.3 342c4.4.5 8.8 1.3 13.1 1.3 51.7 0 99.4-33.1 113.4-85.3l20.2-75.4 20.2 75.4c14 52.2 61.7 85.3 113.4 85.3 4.3 0 8.7-.8 13.1-1.3L506 445.6l-22.1 9.2c-20.4 8.5-30.1 31.9-21.6 52.3 1.7 4.1 6.4 6 10.4 4.3L635.1 444c4-1.7 6-6.3 4.3-10.4zM275.9 162.1l-112.1-46.5 36.5-63.4 94.5 39.2-18.9 70.7zm88.2 0l-18.9-70.7 94.5-39.2 36.5 63.4-112.1 46.5z"/></svg> We have established the [Injectivity Theorem](#injectivitytheorem)

---
name: simple-inversion

When the distribution function is absolutely continuous, Fourier inversion is simpler <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg>

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

Let `$X \sim P$`,  with cumulative distribution function `$F$` and characteristic function `$\widehat{F}$`

Assume that `$\widehat{F}$` is integrable (with respect to Lebesgue measure)

Then:

- `$P ⊴ \text{Lebesgue}$`

- `$y \mapsto \frac{1}{{2 \pi}} \int_{\mathbb{R}} \widehat{F}(t) \mathrm{e}^{-ity} \mathrm{d}t$` is a uniformly continuous version of the density of `$P$`.

]

---

### Proof

Let `$X \sim P$` with cumulative distribution function `$F$` and characteristic function `$\widehat{F}$`.

Let `$Z \sim \mathcal{N}(0,1)$` be independent from `$Z \perp\!\!\!\perp X$`

Let `$x$`  be a continuity point of `$F$`.

`$$\lim_{\sigma \downarrow 0} P\Big\{ X + \sigma Z  \leq x \Big\} = F(x)$$`

`$$\begin{array}{rl}
\lim_{\sigma \downarrow 0} P\Big\{ X + \sigma Z  \leq x \Big\}
 & = \lim_{\sigma \downarrow 0}
 \int_{-\infty}^x  \left( \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t   \right) \mathrm{d}y \\
 & = \int_{-\infty}^x   \frac{1}{{2 \pi}}\int_{\mathbb{R}} \lim_{\sigma \downarrow 0} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t    \mathrm{d}y \\
 & = \int_{-\infty}^x   \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t    \mathrm{d}y \,
\end{array}$$`

where interchange of limit and integration is justified by dominated convergence.

---

### An alternative inversion formula.

Let `$P$` be a probability distribution over `$\mathbb{R}$` with cumulative distribution function `$F$`, then

`$$\lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^T \frac{\mathrm{e}^{-it a} - \mathrm{e}^{-it b}}{it} \widehat{F}(t) \mathrm{d}t
= F(b_-) - F(a) + \frac{1}{2} \Big(P\{b\} + P\{a\}\Big)$$`

---

The proof of the alternative inversion formula can be found in textbooks like [Durrett 2010]

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Corollary
Let `$\widehat{F}$` denote the characteristic function of the probability distribution `$P$`.

`$$\widehat{F}(t) = \mathrm{e}^{-\frac{t^2}{2}} \Rightarrow  P = \mathcal{N}(0,1)$$`

`$$\widehat{F}(t) = \mathrm{e}^{i\mu t -\frac{\sigma^2 t^2}{2}} \Rightarrow  P = \mathcal{N}(\mu,\sigma^2)$$`

]

---
name: steinsidentity

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M560 224h-29.5c-8.8-20-21.6-37.7-37.4-52.5L512 96h-32c-29.4 0-55.4 13.5-73 34.3-7.6-1.1-15.1-2.3-23-2.3H256c-77.4 0-141.9 55-156.8 128H56c-14.8 0-26.5-13.5-23.5-28.8C34.7 215.8 45.4 208 57 208h1c3.3 0 6-2.7 6-6v-20c0-3.3-2.7-6-6-6-28.5 0-53.9 20.4-57.5 48.6C-3.9 258.8 22.7 288 56 288h40c0 52.2 25.4 98.1 64 127.3V496c0 8.8 7.2 16 16 16h64c8.8 0 16-7.2 16-16v-48h128v48c0 8.8 7.2 16 16 16h64c8.8 0 16-7.2 16-16v-80.7c11.8-8.9 22.3-19.4 31.3-31.3H560c8.8 0 16-7.2 16-16V240c0-8.8-7.2-16-16-16zm-128 64c-8.8 0-16-7.2-16-16s7.2-16 16-16 16 7.2 16 16-7.2 16-16 16zM256 96h128c5.4 0 10.7.4 15.9.8 0-.3.1-.5.1-.8 0-53-43-96-96-96s-96 43-96 96c0 2.1.5 4.1.6 6.2 15.2-3.9 31-6.2 47.4-6.2z"/></svg>

Another important byproduct of the proof of injectivity of the characteristic function
is Stein's identity, an important property of the standard Gaussian distribution.

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem Stein's identity

Let  `$X \sim \mathcal{N}(0,1)$`, and `$g$` be a differentiable function such that `$\mathbb{E}|g'(X)|< \infty$`, then

`$$\mathbb{E}[g'(X)] = \mathbb{E}[Xg(X)]$$`

Conversely, if `$X$` is a random variable such that

`$$\mathbb{E}[g'(X)] = \mathbb{E}[X g(X)]$$`

holds for any differentiable funtion  `$g$` such that  `$g'$` is integrable, then
 `$X \sim \mathcal{N}(0,1)$`.

]

---

### Proof

Direct part follows by integration by parts.

Converse,

- Note that if `$X$` satisfies the identity in the Theorem,
then for all `$t \in \mathbb{R}$`, the functions `$t \mapsto \mathbb{E} \cos(tX)$`
and `$t \mapsto \mathbb{E} \sin(tX)$` satisfy the differential equation `$g'(t) = t g(t)$`
with conditions `$\mathbb{E} \cos(0X)=1$` and `$\mathbb{E} \sin(0X) =0$`.

- This entails `$\mathbb{E} \mathrm{e}^{itX} = \exp\Big(-\frac{t^2}{2}\Big)$`, that is
`$X \sim \mathcal{N}(0,1)$`

---
name: laplace-distrib

### Application : Exponential and Cauchy distributions

Let `$X = \epsilon \times Y$` with
`$$\epsilon \quad \bot\!\!\!\bot \quad Y \sim \text{Exp}$$`
with `$P\{\epsilon=1\}=P\{\epsilon=-1\}=1/2$`

The distribution `$Y$` is called the Laplace distribution

`$$\begin{array}{rl}\widehat{F}_Y(t) & = \int_{\mathbb{R}} \frac{1}{2}\mathrm{e}^{-|x|} \cos(tx)\mathrm{d}x \qquad\text{(symmetry)}\\ &= \int_{[0, \infty)} \mathrm{e}^{-x} \cos(tx)\mathrm{d}x\\&=[-\mathrm{e}^{-x}\cos(tx)]_0^\infty - t \int_0^\infty \mathrm{e}^{-x}\sin(tx)\mathrm{d}x\qquad\text{(IBP)}\\ & = 1 +t [\mathrm{e}^{-x}\sin(tx)]_0^\infty - t^2 \int_0^\infty \mathrm{e}^{-x}\cos(tx)\mathrm{d}x\qquad\text{(IBP bis)}\\ & = 1 -t^2 \widehat{F}_Y(t)\end{array}$$`

The characteristic function of the Laplace distribution is

`$$\widehat{F}_Y(t) = \frac{1}{1+ t^2}$$`

???

IPP stands for Integration By Parts

---
name: exp-distrib

### Application : exponential and Cauchy distributions (continued)

IF we turn back to the characteristic function of the Exponential distribution

`$$\begin{array}{rl}\widehat{F}_Y(t) & = \frac{1}{2} \widehat{F}_X(t) + \frac{1}{2} \widehat{F}_X(-t)\end{array}$$`

`$$\widehat{F}_X(-t) = \overline{\widehat{F}_X(t)}$$`

`$$\Re\left(\widehat{F}_X(t)\right) = \widehat{F}_Y(t) = \frac{1}{1+ t^2}$$`

By integration by parts again

`$$\Im\left(\widehat{F}_X(t)\right) = t \Re\left(\widehat{F}_X(t)\right)$$`

The exponential distribution has characteristic function

`$$\widehat{F}_X(t) = \frac{1}{1+t^2} (1 + it) = \frac{1}{1-it}$$`

???

No need to perform complex integration here

---
name: cauchy-distrib

### Application : exponential and Cauchy distributions (continued)

Let `$V$` be uniformly distributed on `$]-\pi/2, \pi/2[$`

Let `$U = \tan(V)$`

`$U$` is a symmetric random variable with absolutely continuous distribution and density

`$$x \mapsto \frac{1}{\pi} \frac{1}{1+x^2}$$`
over `$\mathbb{R}$`

Up to factor `$1/\pi$` the density of the Cauchy distribution is the characteristic function of the Laplace distribution

By the [inversion formula](#simple-inversion)

`$$\begin{array}{rl}\widehat{F}_U(t) & = \int_{\mathbb{R}} \mathrm{e}^{it x} \frac{1}{\pi} \frac{1}{1+x^2} \mathrm{d} x\\ & = \frac{1}{\pi} \int_{\mathbb{R}} \mathrm{e}^{it x}  \widehat{F}_Y(x) \mathrm{d} x\\ & = \frac{1}{\pi} \int_{\mathbb{R}} \mathrm{e}^{- it x}  \widehat{F}_Y(x) \mathrm{d} x \\& = \mathrm{e}^{-|t|}\end{array}$$`

???

pay attention to normalizations

---
name: cauchy-stable

### Application : exponential and Cauchy distributions (continued)

Let `$U_1, \ldots, U_n \sim_{\text{i.i.d.}} \text{Cauchy}$`

Let `$Z = \frac{1}{n}\sum_{i=1}^n U_i$`

Then

`$$\widehat{F}_Z(t) = \prod_{i=1}^n \widehat{F}_U(t/n) = \mathrm{e}^{-|t|} = \widehat{F}_U(t)$$`

By the injectivity theorem, `$Z \sim U$`

The Cauchy distribution has no expectation, it does not satisfy the conditions of the strong law of large numbers.

The Cauchy distribution (along with the Normal distribution) is an example of _stable_ distribution

???

If `$X \bot\!\!\!\bot Y$` and `$X \sim Y \sim \text{Cauchy}$`,
`$$\widehat{F}_{X+Y}(t) = \mathrm{e}^{-2|t|} = \widehat{F}_{X+X}(t) = \widehat{F}_{X}(t) \times \widehat{F}_{X}(t)$$`

This illustrates the fact that `$\widehat{F}_{X+Y} = \widehat{F}_{X} \times \widehat{F}_{Y}$` does not imply independence of `$X$` and `$Y$`.

---
name: secdifferentiabilityintegrability
template: inter-slide

## Differentiability and integrability

---
name: differentiabilitytehorem

Differentiability of the Fourier transform at `$0$` and  integrability
are intimately related

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

If `$X$` is `$p$`-integrable for some `$p \in \mathbb{N}$`

then

- the Fourier transform of the distribution of `$X$` is `$p$`-times
differentiable at `$0$`

- the `$p^{\text{th}}$` derivative at `$0$` equals `$i^k \mathbb{E}X^k$`

]

???

Recall what we had for probability generating function

---

### Proof

The proof relies on a Taylor expansion with remainder of `$x \mapsto \mathrm{e}^{ix}$`  at `$x=0$`:

`$$\mathrm{e}^{ix} - \sum_{k=0}^n \frac{(ix)^k}{k!} = \frac{i^{n+1}}{n!} \int_0^x (x-s)^n \mathrm{e}^{is} \mathrm{d}s$$`

---

### Proof (continued)

The modulus of the right hand side can be upper-bounded in two different ways.

`$$\frac{1}{n!}\Big| \int_0^x (x-s)^n \mathrm{e}^{is} \mathrm{d}s \Big|
  \leq \frac{|x|^{n+1}}{(n+1)!}$$`

which is good when `$|x|$` is small.

To handle large values of `$|x|$`, integration by parts leads to

`$$\frac{i^{n+1}}{n!} \int_0^x (x-s)^n \mathrm{e}^{is} \mathrm{d}s =  \frac{i^{n}}{(n-1)!} \int_0^x (x-s)^{n-1} \left(\mathrm{e}^{is}-1\right) \mathrm{d}s \,.$$`

The modulus of the right hand side can be upper-bounded by `$2|x|^n/n!$`.

---

### Proof (continued)

Applying this Taylor expansion to `$x=t X$`, using the pointwise upper bounds and taking expectations leads to
`$$\begin{array}{rl}
  \Big| \widehat{F}(t) - \sum_{k=0}^n \mathbb{E}\frac{(itX)^k}{k!}  \Big|
    & \leq \mathbb{E} \Big[\min\Big( \frac{|tX|^{n+1}}{(n+1)!} ,2 \frac{|tX|^n}{n!}\Big)\Big] \\
    & = \frac{|t|^n}{(n+1)!} \mathbb{E} \Big[\min\Big(|tX|^{n+1} ,2 (n+1) |X|^n \Big)\Big] \, .
\end{array}$$`

The right hand side is well defined as soon as `$\mathbb{E}|X|^n < \infty$`.

---

### Proof (continued)

By dominated convergence,

`$$\lim_{t \to 0} \mathbb{E} \Big[\min\Big(|tX|^{n+1} ,2 (n+1) |X|^n \Big)\Big] = 0\,$$`
Hence we have established that if `$\mathbb{E}|X|^n < \infty$`,

`$$\widehat{F}(t) = \sum_{k=0}^n i^k \mathbb{E}X^k \frac{t^k}{k!} + o(|t|^n)$$`

---

In the other direction, the connection is not as simple (<svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm170.2 218.2C315.8 367.4 282.9 352 248 352s-67.8 15.4-90.2 42.2c-13.5 16.3-38.1-4.2-24.6-20.5C161.7 339.6 203.6 320 248 320s86.3 19.6 114.7 53.8c13.6 16.2-11 36.7-24.5 20.4z"/></svg>): differentiability of the Fourier transform does not imply integrability.

But ...

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

If the Fourier transform `$\widehat{F}$` of the distribution of `$X$` satisfies

`$$\lim_{h \downarrow 0} \frac{2 - \widehat{F}(h) - \widehat{F}(-h)}{h^2} = \sigma^2 < \infty$$`

then `$\mathbb{E}X^2 = \sigma^2$`

]

---

### Proof

`$$2 - \widehat{F}(h) - \widehat{F}(-h) =  2\mathbb{E}\Big[1 - \cos(hX)\Big]$$`

Using Taylor with remainder formula for `$\cos$` at `$0$`:

`$$1 - \cos x = \int_0^x \cos(s) (x-s) \mathrm{d}s = x^ 2 \int_0^1 \cos(sx) (1-s) \mathrm{d}s$$`

`$$\begin{array}{rl}
  \frac{2\mathbb{E}\Big[1 - \cos(hX)\Big]}{h^2}
    & = 2 \frac{\mathbb{E}\Big[ h^2 X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s\Big]}{h^2} \\
    & = 2 \mathbb{E}\Big[ X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s\Big] \, .
\end{array}$$`

---

### Proof (continued)

By Fatou's Lemma:

`$$\begin{array}{rl}\sigma^2  & = \lim_{h \downarrow 0} 2\mathbb{E}\Big[ X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s\Big]\\ & \geq 2\mathbb{E}\Big[\liminf_{h \downarrow 0} X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s \Big]\end{array}$$`

But for all `$x \in \mathbb{R}$`, by dominated convergence:

`$$\liminf_{h \downarrow 0} x^ 2 \int_0^1 \cos(shx) (1-s) \mathrm{d}s = \frac{x^2}{2}$$`

Hence `$$\sigma^2 \geq \mathbb{E} X^2$$`

The proof is completed by invoking the [Theorem connecting integrability and smoothness of Fourier transform](#differentiabilitytehorem)

---
name:secquantiles
template: inter-slide

## Quantile functions

---

So far we have seen several characterizations of probability distributions:

- Cumulative Distribution Functions

- Probability Generating Functions

- Characteristic functions

The last characterization is praised
for its behavior with respect to sums of independent random variables.

For _univariate distributions_, a companion to the cumulative distribution function  is the _quantile function_

???

The quantile function plays a significant role in simulations, statistics and  risk theory.

---

The quantile transformation has many applications. It can be used to
show stochastic domination properties.

### Example

The quantile function
of a discrete distribution is a step function  that jumps at the cumulative probability of every possible outcome.

If a probability distribution is a mixture of a discrete distribution and a continuous distribution, the quantile function jumps at the cumulative probability of every possible outcome of the discrete component.

???

In Figures \@ref(fig:quantiles)-\@ref(fig:quantiles4), we illustrate quantile functions for discrete (binomial)
distributions and for distributions that are neither discrete nor continuous.

---
name: quantiles1

### Quantile functions of binomial distributions

.pull-left[

<div class="figure">
<img src="cm-8-characterizing-distributions_files/figure-html/quantiles-1.png" alt="Binomial quantiles" width="504" />
<p class="caption">Binomial quantiles</p>
</div>

]

.pull-right[

Quantile functions of Binomial distributions with parameters `$n=10$` and `$p \in \{.2, .5\}$`

The two quantile functions have the same range: `$\{0, 1, \ldots, 10\}$`

]

---
name: quantiles2

### Quantile functions of clipped Gaussians

.pull-left[

Quantile functions of

`$$\max(X, \tau) \text{ where } X \sim \mathcal{N}(0,1)$$`

for `$\tau \in \{0, -1\}$`

]

.pull-right[

The quantile  function of `$\max(X,\tau)$` is

`$$\mathbb{I}_{(0,\Phi(\tau))}(p) \times \tau + \Phi^{\leftarrow}(p) \times \mathbb{I}_{(\Phi(\tau), 1)}(p)$$`

`$$\Phi^{\leftarrow}(p \vee \Phi(\tau))$$`

The two distributions are neither absolutely continuous nor discrete

]

???

Let `$\Phi^\leftarrow$` denote the quantile function of `$\mathcal{N}(0,1)$`

---
name: quantiles3

### Cumulative distribution functions of clipped Gaussians

.pull-left[

<div class="figure">
<img src="cm-8-characterizing-distributions_files/figure-html/quantiles3-1.png" alt="Clipped Gaussians CDF" width="504" />
<p class="caption">Clipped Gaussians CDF</p>
</div>

]

.pull-right[

Cumulative distribution functions for  clipped Gaussian distributions

]

---
name: quantiles4

.pull-left[

<div class="figure">
<img src="cm-8-characterizing-distributions_files/figure-html/quantiles4-1.png" alt="(ref:quantiles4)" width="504" />
<p class="caption">(ref:quantiles4)</p>
</div>
]

.pull-right[

Representation of `$F \circ F^{\leftarrow}$` for the clipped Gaussian distributions

The function `$F \circ F^{\leftarrow}$` always lies above the line `$y=x$` (dotted line) as prescribed in [Proposition](#fundapropquantiles)

Plateaux that lie strictly above the dotted line are in correspondence with  jumps of the quantile function

]

---

A cumulative distribution function `$F$` is

- non-negative

- `$[0,1]$`-valued

- non-decreasing

- right-continuous, with left-limit at any point.

---

The quantile function `$F^{\leftarrow}$` is defined as an _extended inverse_
of the cumulative distribution function `$F$`.

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Definition

The quantile function `$F^{\leftarrow}$` of random variable  `$X$`
distributed according to `$P$` (with cumulative distribution function `$F$`) is defined as

`$$\begin{array}{rl}
F^{\leftarrow} (p)
   & = \inf \Big\{ x : P\{X \leq x\} \geq p \Big\} \\
   & = \inf \Big\{ x : F (x) \geq p \Big\} \qquad \text{for } p \in (0,1).
\end{array}$$`

]

---

---
name: fundapropquantiles

The interplay between the quantile and cumulative distribution functions
is summarized in the next proposition.

### Proposition

If `$F$` and `$F^\leftarrow$` are  the CDF and the quantile function of
(the distribution of) `$X$`, the following statements hold  for  `$p \in] 0, 1 [$`:

1. `$p \leq  F (x)$` iff `$F^\leftarrow (p) \leq  x$`

1. `$F \circ F^\leftarrow (p) \geq p$`

1. `$F^\leftarrow \circ F (x) \leq  x$`

1. If  `$F ⊴ \text{Lebesgue}$`, then `$F \circ F^\leftarrow (p) = p$`

---

### Proof

According to the definition of `$F^\leftarrow$` if `$F (x) \geq p$`
then `$F^\leftarrow (p) \leq  x$`

To prove the  converse, it suffices to check that `$F \circ F^\leftarrow (p) \geq p$`

Indeed, if `$x \geq F^\leftarrow (p),$` as
`$F ↘$`,  `$F (x) \geq F \circ F^\leftarrow (p)$`

If `$y = F^\leftarrow (p),$` by definition of `$y = F^\leftarrow (p)$`

`$\exists (z_n)_{n \in \mathbb{N}}$` with `$z_n \searrow y$` such that `$F (z_n) \geq p$`

But as `$F$` is right-continuous `$\lim_n F(z_n) = F(\lim_n z_n) = F(y)$`

Hence `$F (y) \geq p$`

We just proved  1. and 2. <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg>

---

### Proof  (continued)

3.) is an immediate  consequence de 1).

Let `$p = F (x) .$` Hence `$p \leq  F (x),$` according to 1.)

This is equivalent to  $F^\leftarrow (p)
\leq  x,$ that is  `$F^\leftarrow \circ F (x) \leq  x.$`

4.) For every `$p$` in `$] 0, 1 [,$`  `$\{ x : p = F (x)\}$` is non-empty (Mean value Theorem).

Let `$y = \inf \{ x : p = F (x)\} =F^\leftarrow (p)$`.

According to  `$1)$`,  `$F (y) \geq p$`.

Now,  if `$(z_n)_{n \in \mathbb{N}} \nearrow y$`,

for every  `$n$`, `$F (z_n) < p,$` and,

By left-continuity, `$F (y) = F (\lim_n \uparrow z_n) = \lim_n F (z_n) \leq p.$`

Hence `$F (y) = p,$` that is  `$F \circ F^\leftarrow (p) = p$`

---

### Proposition (Quantile transformation)

If `$U$` is uniformly distributed on `$(0,1)$`, and `$F$` is a cumulative
distribution over `$\mathbb{R}$`, `$F^{\leftarrow}(U)$` has cumulative distribution `$F$`.

The quantile transformation works whatever the continuity properties of `$F$`

---

### Proof

`$$\begin{array}{rl}
  P\Big\{ F^\leftarrow(U) \leq x \Big\}
  & = P\Big\{ U \leq F(x) \Big\} \\
  & = F(x) \, .
\end{array}$$`

---
name: qqplots

### QQ-plots

.pull-left[

<div class="figure">
<img src="cm-8-characterizing-distributions_files/figure-html/qqplot-1.png" alt="(ref:qqplot)" width="504" />
<p class="caption">(ref:qqplot)</p>
</div>

A popular device for comparing two probability distributions on the real line defined by their cumulative distributions `$F$` and `$G$`.

The QQ-plot consists in plotting function `$G^{\leftarrow} \circ F$`.

]

.pull-right[

If the two distributions are the same, the line

`$$y = G^{\leftarrow} \circ F(x)$$`

should lie below `$y=x$` (according to [Proposition](#fundapropquantiles).

The two lines should coincide if `$F=G$` is continuous and increasing.

Here:

- `$F = \Phi$` is the cumulative distribution function of `$\mathcal{N}(0,1)$`.

- `$G$` is the empirical distribution of a sample of size `$500$` of `$\mathcal{N}(0,1)$`.

]

---

Let us conclude this section with an important observation  concerning
the behavior of `$F(X)$` when `$X \sim P$`  with cumulative distribution function
`$F$`.

.bg-light-gray.b--dark-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Corollary

If  `$X \sim P$` with continuous cumulative distribution function `$F$`,

then

`$F (X)$` and `$1 - F (X)$` are uniformly distributed on `$[0, 1]$`.

]

---
exclude: true
class: middle, center, inverse

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: 112%

# The End

---

class: middle, center, inverse

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: cover

# The End