Product spaces and product distributions

name: layout-general
layout: true
class: left, middle

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/Universite_Paris_logo_horizontal.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
class: middle, center, inverse
background-size: 4%
background-position: 97% 3%

# Product distributions

### 2021-01-08

#### [Probabilités Master I MIDS](http://stephane-v-boucheron.fr/courses/probability/)

### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
class: inverse, center, middle

## Motivation

## <svg style="height:0.8em;top:.04em;position:relative;fill:white;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm80 256c-60.6 0-134.5-38.3-143.8-93.3-2-11.8 9.3-21.6 20.7-17.9C155.1 330.5 200 336 248 336s92.9-5.5 123.1-15.2c11.3-3.7 22.6 6.1 20.7 17.9-9.3 55-83.2 93.3-143.8 93.3z"/></svg>

---

### <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M496 384H64V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-32c0-8.84-7.16-16-16-16zM464 96H345.94c-21.38 0-32.09 25.85-16.97 40.97l32.4 32.4L288 242.75l-73.37-73.37c-12.5-12.5-32.76-12.5-45.25 0l-68.69 68.69c-6.25 6.25-6.25 16.38 0 22.63l22.62 22.62c6.25 6.25 16.38 6.25 22.63 0L192 237.25l73.37 73.37c12.5 12.5 32.76 12.5 45.25 0l96-96 32.4 32.4c15.12 15.12 40.97 4.41 40.97-16.97V112c.01-8.84-7.15-16-15.99-16z"/></svg>

Description of _random walks_ over `$\mathbb{Z}^d$` :

> at each step, we chose a random neighbour of the current position and  move to that neighbour.

An elementary move is an element of `$\{0, \pm 1\}^d$` where one component is non-zero

Picking an elementary move uniformly at random is easy

Picking finitely many independent moves is easy too

Enough to have an .red[infinite] supply of independent move-valued random variables

---

### <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M501.1 395.7L384 278.6c-23.1-23.1-57.6-27.6-85.4-13.9L192 158.1V96L64 0 0 64l96 128h62.1l106.6 106.6c-13.6 27.8-9.2 62.3 13.9 85.4l117.1 117.1c14.6 14.6 38.2 14.6 52.7 0l52.7-52.7c14.5-14.6 14.5-38.2 0-52.7zM331.7 225c28.3 0 54.9 11 74.9 31l19.4 19.4c15.8-6.9 30.8-16.5 43.8-29.5 37.1-37.1 49.7-89.3 37.9-136.7-2.2-9-13.5-12.1-20.1-5.5l-74.4 74.4-67.9-11.3L334 98.9l74.4-74.4c6.6-6.6 3.4-17.9-5.7-20.2-47.4-11.7-99.6.9-136.6 37.9-28.5 28.5-41.9 66.1-41.2 103.6l82.1 82.1c8.1-1.9 16.5-2.9 24.7-2.9zm-103.9 82l-56.7-56.7L18.7 402.8c-25 25-25 65.5 0 90.5s65.5 25 90.5 0l123.6-123.6c-7.6-19.9-9.9-41.6-5-62.7zM64 472c-13.2 0-24-10.8-24-24 0-13.3 10.7-24 24-24s24 10.7 24 24c0 13.2-10.7 24-24 24z"/></svg>

We need a building device that allows us to glue probability spaces together
so as to get stochastically independent components in the resulting probability space

We need tools to perform computations in the resulting probability space

---
class: center, middle, inverse

## Product of two probability spaces

---

## Product `$\sigma$`-algebra

Given: two measured spaces

`$$(\mathcal{X}, \mathcal{F}, \mu) \quad \text{ and } \quad (\mathcal{Y}, \mathcal{G}, \nu)$$`

### Goal

Build a measure space `$(\mathcal{X}\times \mathcal{Y}, \mathcal{H}, \rho)$`
and two measurable functions `$X : \mathcal{X}\times \mathcal{Y} \to \mathcal{X}$`
and  `$Y : \mathcal{X}\times \mathcal{Y} \to \mathcal{Y}$` with

- `$\mu = \rho \circ X^{-1}$` and `$\nu = \rho \circ Y^{-1}$`

- `$\rho(A \times B) =  \mu(A) \times \nu(B) \qquad \forall A \in \mathcal{F}, B\in \mathcal{G} \,.$`

---

.content-box-gray[
### Definition: Product sigma-algebra

Let `$(\mathcal{X}, \mathcal{F})$`  and `$(\mathcal{Y}, \mathcal{G})$` be two measurable spaces.

The product `$\sigma$`-algebra `$\mathcal{F} \otimes \mathcal{G}$` is the `$\sigma$`-algebra
of subsets of `$2^{\mathcal{X} \times \mathcal{Y}}$` that is generated by the so-called _rectangles_:

`$$\Big\{ A \times B : A \in \mathcal{F}, B \in \mathcal{G}\Big\} \, .$$`
]

---

#### Example

If `$\mathcal{F} = \mathcal{G}  =\mathcal{B}(\mathbb{R})$`

`$$\mathcal{F} \otimes \mathcal{G} = \sigma\left(A \times B : A, B \in \mathcal{B}(\mathbb{R})\right)$$`

`$$\mathcal{B}(\mathbb{R}) \otimes \mathcal{B}(\mathbb{R}) = \mathcal{B}\left(\mathbb{R}^2 \right)$$`

---

### More generally

If `$\mathcal{F} = \sigma(\mathcal{A})$` (resp. `$\mathcal{G} = \sigma(\mathcal{B})$` )
with `$\mathcal{A}$`  (resp. `$\mathcal{B}$` ) a `$\pi$` -class

Then
`$$\mathcal{F} \otimes \mathcal{G} = \sigma\left(\mathcal{A}\right) \otimes \sigma\left(\mathcal{B}\right) = \sigma\left(\mathcal{A} \times \mathcal{B}\right)$$`

---

### Recall 💉

.content-box-gray[
### Definition:

A measure `$\mu$` on `$(\Omega, \mathcal{F})$` is `$\sigma$`-finite iff

there exists `$(A_n)_n$` with

- `$\Omega \subseteq \cup_n A_n$`

- `$\mu(A_n) < \infty$` for each `$n$`.

]

- Finite measures (this encompasses  probability measures) are `$\sigma$`-finite <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg>

- Lebesgue measure is `$\sigma$`-finite <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg>

- The counting measure on `$\mathbb{R}$` is __not__ `$\sigma$`-finite <svg style="height:0.8em;top:.04em;position:relative;fill:gray;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm170.2 218.2C315.8 367.4 282.9 352 248 352s-67.8 15.4-90.2 42.2c-13.5 16.3-38.1-4.2-24.6-20.5C161.7 339.6 203.6 320 248 320s86.3 19.6 114.7 53.8c13.6 16.2-11 36.7-24.5 20.4z"/></svg>

---

.content-box-gray[

### Product-measure Theorem

Let `$(\mathcal{X}, \mathcal{F}, \mu)$` and `$(\mathcal{Y}, \mathcal{G}, \nu)$` be
two measured spaces where `$\mu,\nu$` are `$\sigma$`-finite.

Then there exists a .red[unique]
`$\sigma$`-finite measure `$\alpha$` on `$\mathcal{X} \times \mathcal{Y}$` endowed with the product `$\sigma$`-algebra
`$\mathcal{F} \otimes \mathcal{G} = \sigma(\mathcal{F} \times \mathcal{G})$` that satisfies

`$$\alpha (A \times B) = \mu(A) \times \nu(B)\qquad \forall A \in \mathcal{F}, B \in \mathcal{G} \, .$$`

... (to be continued)
]

---

.content-box-gray[

### Theorem  (continued)

Moreover, for all `$E \in \mathcal{F} \otimes \mathcal{G}$`,

1. for each `$x \in \mathcal{X}$`, `$y \mapsto \mathbb{I}_E(x,y)$` is `$\mathcal{G}$`-measurable;

1. `$x \mapsto \int_{\mathcal{Y}} \mathbb{I}_E(x,y) \, \mathrm{d} \nu(y)$`
is `$\mathcal{F}$`-measurable;

1. for each `$y \in \mathcal{Y}$`, `$x \mapsto \mathbb{I}_E(x,y)$` is `$\mathcal{F}$`-measurable;

1. `$y \mapsto \int_{\mathcal{X}}  \mathbb{I}_E(x,y) \,  \mathrm{d}\mu(x)$`
is `$\mathcal{G}$`-measurable,

and the following holds:

`$$\int_{\mathcal{X}\times \mathcal{Y}} \mathbb{I}_E \, \mathrm{d}\alpha
= \int_{\mathcal{X}} \Big(\int_{\mathcal{Y}} \mathbb{I}_E(x,y) \, \mathrm{d} \nu(y)\Big) \, \mathrm{d}\mu(x)
= \int_{\mathcal{Y}} \Big( \int_{\mathcal{X}}  \mathbb{I}_E(x,y) \,  \mathrm{d}\mu(x)\Big) \,\mathrm{d} \nu(y)$$`

where the three integrals are either finite or infinite.

]

.blue[
Measure `$α$` is called a _product measure_, denoted by `$\mu \otimes \nu$`.
]

???

---

### <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M439.15 453.06L297.17 384l141.99-69.06c7.9-3.95 11.11-13.56 7.15-21.46L432 264.85c-3.95-7.9-13.56-11.11-21.47-7.16L224 348.41 37.47 257.69c-7.9-3.95-17.51-.75-21.47 7.16L1.69 293.48c-3.95 7.9-.75 17.51 7.15 21.46L150.83 384 8.85 453.06c-7.9 3.95-11.11 13.56-7.15 21.47l14.31 28.63c3.95 7.9 13.56 11.11 21.47 7.15L224 419.59l186.53 90.72c7.9 3.95 17.51.75 21.47-7.15l14.31-28.63c3.95-7.91.74-17.52-7.16-21.47zM150 237.28l-5.48 25.87c-2.67 12.62 5.42 24.85 16.45 24.85h126.08c11.03 0 19.12-12.23 16.45-24.85l-5.5-25.87c41.78-22.41 70-62.75 70-109.28C368 57.31 303.53 0 224 0S80 57.31 80 128c0 46.53 28.22 86.87 70 109.28zM280 112c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32zm-112 0c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32z"/></svg>

### Assuming that both `$μ$` and `$ν$` are .red[ `$σ$` -finite] is essential!

Choose

- `$\mu$` as the counting measure on `$[0,1]$` and

- `$\nu$` as the Lebesgue measure on `$[0,1]$`.

Consider the diagonal `$E = \{(x,x) : x \in [0,1]\}$`.

The set `$E$` belongs to `$\mathcal{B}(\mathbb{R}) \otimes \mathcal{B}(\mathbb{R}) = \mathcal{B}(\mathbb{R}^2)$`

`$$μ ⊗ ν(E) = ???$$`

---

.content-box-red[

Interchanging the order of integration leads to different results:

`\begin{align*}
1      & = \int_{[0,1]} \Big(\underbrace{\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\mu(x)}_{=1}\Big) \, \mathrm{d}\nu(y) \\
0      & = \int_{[0,1]} \Big(\underbrace{\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\nu(y)}_{=0}\Big) \, \mathrm{d}\mu(x)
\end{align*}`

]

---

#### The product-measure theorem contains three statements:

- existence of a  measure
over `$(\mathcal{X} \times \mathcal{Y}, \mathcal{F} \otimes \mathcal{G})$` that
satisfies the product property over rectangles

- uniqueness of this measure

- the possibility of computing the measure of  `$E \in \mathcal{F} \otimes \mathcal{G}$`
by iterated integration in arbitrary order.

---

- The first statement is proved using an extension theorem

- The second statement follows from a monotone class argument (rectangles form a generating `$\pi$`-class)
  - the case where both `$\mu$` and `$\nu$` are finite measure is settled
  - If either `$\mu$` or `$\nu$` is just `$\sigma$`-finite, consider restrictions to rectangles with finite measure, and
proceed by approximation.

- The third statement trivially holds for rectangles.

---

---
class: center, middle, inverse

## Tonelli-Fubini theorem

---

> The Tonelli-Fubini Theorem shows that (under mild conditions) integration with respect to a product measure reduces to iterated integration over the component measures

---

.content-box-gray[
### Theorem Tonelli-Fubini

Let  `$( \mathcal{X}, \mathcal{A})$` and `$(\mathcal{Y}, \mathcal{B})$` ne two measurable spaces,  `$\mu$` and `$\nu$` two
`$\sigma$`-finite measures on these spaces, `$\mu \otimes \nu$` the product measure,
and `$f$` a  `$\mathcal{A} \otimes \mathcal{B}$`-measurable real function
such as `$\int |f| \mathrm{d} \mu \otimes \nu < 0$`.

The following properties are satisfied:

i. `$\forall x \in \mathcal{X}, \hspace{1em} y \mapsto f (x, y)$` is
`$\mathcal{B}$`-measurable.

i. The function  `$x \mapsto \int_{\mathcal{Y}} f (x, y) \mathrm{d}\nu(y)$` is `$\mathcal{A}$`-measurable, finite `$\mu$`- almost everywhere and
`$$\int_{\mathcal{X} \times \mathcal{Y}} f \mathrm{d} \mu \otimes \nu
= \int_{\mathcal{X}} \left[ \int_{\mathcal{Y}} f (x, y) \mathrm{d} \nu (y)
\right]  \mathrm{d} \mu (x)$$`

]

---

#### Proof

---

#### A simple consequence of the Tonelli-Fubini Theorem.

.content-box-gray[
### Proposition "IPP formula"

Let `$X$` be a non-negative real-valued random variable, then

`$$\mathbb{E}X = \int_0^\infty  P\{ X > t \} \mathrm{d}t$$`

]

---

#### Proof

`\begin{align*}
\mathbb{E}X
  & = \int_{\Omega}  X(\omega) \, \mathrm{d}P(\omega) \\
  & = \int_{\Omega}  \Big( \int_{[0,\infty)} \mathbb{I}_{X(\omega)> t} \mathrm{d}t \Big)\, \mathrm{d}P(\omega) \\
  & =  \int_{[0,\infty)}  \Big( \int_{\Omega} \mathbb{I}_{X(\omega)> t} \,  \mathrm{d}P(\omega) \Big) \mathrm{d}t \\
  & =  \int_{[0,\infty)}  \Big( P\{ \omega : X(\omega) > t \} \Big) \mathrm{d}t
\end{align*}`

---
class: center, middle, inverse

## Independence and product distributions

---

### Two random variables

Let the two random variables `$X, Y$` map `$(\Omega, \mathcal{F})$` to `$(\mathcal{X}, \mathcal{G})$`
and `$(\mathcal{Y}, \mathcal{H})$`.

Equip `$(\Omega, \mathcal{F})$` with probability distribution `$P$`.

Let `$Q_X = P \circ X^{-1}$` and `$Q_Y =  P \circ Y^{-1}$`  be the two image distributions (called the
marginal distributions).

Let `$Q$`  be the joint distribution of `$(X,Y)$` under `$P$`, that is the probability distribution
over `$\mathcal{X} \times \mathcal{Y}$` that is uniquely defined by

`$$Q( A \times B) = P\Big\{ \omega: X(\omega) \in A, Y(\omega) \in B \Big\}$$`

.content-box-gray[
Then
`$$X \perp\!\!\!\perp Y \text{ under } P \Longleftrightarrow  Q = Q_X \otimes Q_Y \, ,$$`

In words,  `$X$` and `$Y$` are independent iff their joint distribution is the product
of their marginal distributions.
]

---

###  💉 Independence of finitely many `$\sigma$`-algebras

Let `$(\Omega, \mathcal{F}, P)$` be a probability space. Let `$\mathcal{G_1}, \ldots, \mathcal{G}_n$`
be a  collection of sub- `$\sigma$` -algebras

.content-box-gray[

### Definition

This collection is independent with respect to `$P$` if

`$∀ A_1 ∈ \mathcal{G}_1, …, A_n ∈ \mathcal{G}_n$`
`$$P (A_1 ∩ … ∩ A_n) = P(A_1) \times \ldots \times P(A_n)$$`

]

---

###  Independence of countably many `$\sigma$`-algebras

In many applications, independence between  two `$\sigma$`-algebras or a finite collection
of `$\sigma$`-algebras is not enough.

This is the case when deriving or using laws of large numbers.

We have to deal with a _countable collection of independent random variables_. In words,  we have to work with a countable collection of `$\sigma$`-algebras and we need to elaborate a notion of a countable collection of independent `$\sigma$`-algebras.

---

.content-box-gray[

### Definition

Let `$(\Omega, \mathcal{F}, P)$` be a probability space. Let `$\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots$`
be a countable collection of sub `$\sigma$` algebras.

The collection `$\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots$` is said to be independent under `$P$`
if every finite sub-collection is independent under `$P$`.

]

---

### Example

Consider the uniform probability distribution over `$[0,1]$`, define `$X_1, X_2, \ldots$` by

`$$X_n(\omega) = \operatorname{sign}\Big(\sin\big(2^{n+1} \pi \omega \big)\Big)$$`

then `$X_1, \ldots, X_n, \ldots$` form a countable independent collection of random variables.

---
class: center, middle, inverse

## Infinite product measures

---

.content-box-gray[

### Definition Cylindrical `$\sigma$`-algebra

Let `$(\Omega_n, \mathcal{F}_n)_n$` be a countable collection of measurable spaces, the cylinder `$\sigma$`-algebra
is the `$\sigma$`-algebra of subsets of `$\prod_{n=1}^\infty \Omega_n$`  that is generated by subsets of the form:

`$$\prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \qquad\text{with } A_n \in \mathcal{F}_n \text{ for } n \leq m$$`

where `$m$` is any integer.

The subsets are called _finite-dimensional rectangles_ or _cylinders_.

]

---

If each  `$(\Omega_n, \mathcal{F}_n)$` is endowed with a probability distribution, assigning a probability
to cylinders looks straightforward:

`$$\mathbb{P} \left( \prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \right) = \prod_{n=1}^m P_n(A_n) \times \prod_{n=m+1}^\infty P_n(\Omega_n) = \prod_{n=1}^m P_n(A_n)$$`

The question is:

> does `$\mathbb{P}$` extends to the cylinder `$\sigma$`-algebra? If an extension exists, is it unique?

The answer is yes! 🍾

---

.content-box-gray[

### Kolmogorov's extension theorem
Let `$(\Omega_n, \mathcal{F}_n, P_n)_n$` be a countable collection of probability spaces. Then there exists a unique probability
distribution `$\mathbb{P}$` on the cylindrical `$\sigma$`-algebra that satisfy:

`$$\mathbb{P} \left( \prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \right) =
 \prod_{n=1}^m P_n(A_n)$$`

for every finite sequence `$A_1, \ldots, A_m$` in `$\mathcal{F}_1 \times \ldots \times \mathcal{F}_m$`.

]

---
exclude: true
class: center, middle, inverse

## Infinite independent collection of  events

---
exclude: true
class: center, middle, inverse

## Infinite independent collection of  `$\sigma$`-algebras

---
class: center, middle, inverse

## Second Borel-Cantelli Lemma

---

.content-box-gray[

### Lemma

Let `$A_1, …, A_n, …$` be a countable independent collection of events under `$(Ω, \mathcal{F}, P)$`.

If `$∑_n P(A_n) = ∞$` then `$P \left(\cap_n \cup_{m\geq n} A_m\right) = 1$`

]

This is a partial converse to the first Borel-Cantelli Lemma

---

#### Proof

`$$P\left(\overline{\cap_n \cup_{m\geq n} A_m}\right) = P\left(\cup_n \overline{\cup_{m\geq n} A_m}\right) = P\left(\cup_n \cap_{m\geq n} \overline{A_m}\right)$$`

For some `$n$`:

`$$P\left(\cap_{m\geq n} \overline{A_m}\right) = P \left(\lim_{k \uparrow \infty} \cap_{n \leq m \leq k} \overline{A_m}\right)
= \lim_{k \uparrow \infty} P \left(\cap_{n \leq m \leq k} \overline{A_m}\right)$$`

`$$P \left(\cap_{n \leq m \leq k} \overline{A_m}\right) = \prod_{m=n}^k P\left( \overline{A_m} \right)
= \prod_{m=n}^k \left( 1 - P\left( {A_m} \right)\right) \leq \mathrm{e}^{- ∑_{m=n}^k P(A_m)}$$`

`$$\lim_{k ↑ ∞} \mathrm{e}^{- ∑_{m=n}^k P(A_m)} = 0$$`

Hence:

`$$P\left(\cap_{m\geq n} \overline{A_m}\right) = 0$$`

`$$P\left(\overline{\cap_n \cup_{m\geq n} A_m}\right)  \leq ∑_n P\left(\cap_{m\geq n} \overline{A_m}\right) =0$$`

---
class: center, middle, inverse

## Absolutely continuous distributions

---

### Densities and absolute continuity

Beyond discrete distributions, the simplest probability distributions are
defined by a density function with respect to a ( `$\sigma$` -finite) measure

This encompasses  the distributions of the so-called _continuous random variables_.

.content-box-gray[

### Definition Absolute continuity

Let   `$\mu, \nu$` be
two `$\sigma$`-additive measures on measurable space `$(\Omega, \mathcal{F})$`,

`$\mu$` is said to be _absolutely continuous_ with respect to `$\nu$` (denoted by
`$\mu \trianglelefteq \nu$`) iff

`$\forall A \in \mathcal{F}$`  `$\nu(A)=0 \Rightarrow \mu(A)=0$`.

]

If `$\mu, \nu$` are two probability distributions, and  `$\mu \trianglelefteq \nu$`,
then any event which is impossible under `$\nu$` is also impossible under `$\mu$`.

---

###

- Is the counting measure on `$\mathbb{R}$` absolutely continuous with respect
to Lebesgue measure?

- Is the converse true?

- Check that absolute continuity is a transitive relationship.

---

### Radon-Nikodym Theorem

Let   `$\mu, \nu$` be two `$\sigma$` -additive measures on measurable space `$(\Omega, \mathcal{F})$`

Assume  `$\nu$` is `$\sigma$` -finite

If `$\mu \trianglelefteq \nu$`, then  there exists a measurable function `$f$` from `$\Omega$` to `$[0, \infty)$` such that

`$$\forall A \in \mathcal{F} \qquad \mu(A) =  \int_A f(\omega) \mathrm{d}\nu(\omega) =  \int_{\Omega} \mathbb{I}_A f \mathrm{d}\nu$$`

---

- The function `$f$` is called _a  version_ of the density of `$\mu$` with respect to `$\nu$`

- The density is also called the Radon-Nikodym derivative of `$\mu$` with respect to `$\nu$`

- The density is sometimes denoted by `$\frac{\mathrm{d}\mu}{\mathrm{d}\nu}$`

---

If we choose `$\mu$` as Lebesgue measure and
`$\nu$` as the counting measure, `$\nu$` is not `$\sigma$`-finite, `$\mu(A)>0$` implies `$\nu(A)=\infty$`
which we may consider as larger than `$0$`. Nevertheless, Lebesgue measure has no density
with respect to the counting measure.

---

.content-box-gray[

###  Chain rule

If `$\rho \trianglelefteq \mu \trianglelefteq  \nu$` , `$f$` is a density of `$\rho$` with respect to `$\mu$`
while `$g$` is a density of `$\mu$` with respect to `$\nu$`, then `$fg$` is a density of `$\rho$`
with respect to `$\nu$`

]

---

### Exponential distribution

The exponential distribution shows up in several areas of probability
and statistics

- In _reliability theory_, its _memoryless property_
make it a borderline case.

- In the theory of _point processes_, the exponential distribution is connected with _Poisson Point Processes_

- It is also important in _extreme value theory_

---

### Definition

The exponential distribution with intensity parameter `$\lambda>0$` is defined by its density with respect
to Lebesgue measure on `$[0,\infty)$`

`$$x \mapsto \lambda \mathrm{e}^{-\lambda x}$$`

The reciprocal of the intensity parameter is called the scale parameter.

---

- Geometric and exponential distributions are connected: if `$X$`
is exponentially distributed, then `$\lceil X\rceil$` is geometrically distributed. For `$k\geq 1$`:

`$$P \Big\{ \lceil X \rceil \geq k \Big\} = P \Big\{  X  > k - 1 \Big\}
= \mathrm{e}^{- \lambda (k-1)}$$`

- Check that `$x \mapsto \lambda \mathrm{e}^{-\lambda x}$` is a density probability
over `$[0, \infty)$`.

- Compute the tail function and the cumulative distribution function of the exponential distribution
function with parameter `$\lambda$`.

---

- Let `$X_1, \ldots, X_n$` be i.i.d. exponentially distributed. Characterize the
distribution of `$\min(X_1, \ldots, X_n)$`.

- If `$X$` is exponentially distributed  with scale parameter `$\sigma$`, what is the
distribution of `$a X$`?

---
exclude: true

### Exponential densities with different parameters: scales `$1, 2, 1/2$` or equivalently intensities `$1, 1/2, 2$`. Expectation equals scale,  variance equals squared scale.

![(ref:witgetexponential)](cm-7-product-distributions_files/figure-html/witgetexponential-1.png)

---
class: inverse, middle, center

## Gamma distributions

---

- Sums of independent exponentially distributed random variables are not exponentially distributed <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm170.2 218.2C315.8 367.4 282.9 352 248 352s-67.8 15.4-90.2 42.2c-13.5 16.3-38.1-4.2-24.6-20.5C161.7 339.6 203.6 320 248 320s86.3 19.6 114.7 53.8c13.6 16.2-11 36.7-24.5 20.4z"/></svg>

- The family of _Gamma distributions_ encompasses the family of exponential distributions

- The family of _Gamma distributions_ with the same intensity parameter is stable under addition <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg>

---

💉  Euler's Gamma function:

`$$\Gamma(t) = \int_0^\infty x^{t-1}\mathrm{e}^{-x} \mathrm{d}x \qquad \text{for } t>0$$`

### Definition

The Gamma distribution with _shape_ parameter `$p>0$` and _intensity_ parameter `$\lambda>0$` is defined by its density with respect to Lebesgue measure on `$[0,\infty)$`:

`$$x \mapsto \lambda^p \frac{x^{p-1}}{\Gamma(p)} \mathrm{e}^{-\lambda x}$$`

The reciprocal of the intensity parameter is called the _scale_ parameter.

---

###

- Check that `$x \mapsto \lambda^p \frac{x^{p-1}}{\Gamma(p)} \mathrm{e}^{-\lambda x}$` is a density probability
over `$[0, \infty)$`.

- If `$X$` is Gamma distributed  with shape parameter `$p$` and scale parameter `$\sigma$`, what is the
distribution of `$a X$`?

---

###  Gamma densities with different parameters:

Scales `$1, 1, 1/3, 1, 2$` and Shapes  `$1, 2, 3, 5, 5/2$`.

Expectation equals shape times scale

Variance equals shape times squared scale

---

![(ref:witgetgamma)](cm-7-product-distributions_files/figure-html/witgetgamma-1.png)

---

### Univariate Gaussian distributions

Gaussian distributions play a central role in Probability theory, Statistics, Information theory, and Analysis

.content-box-gray[

### Definition

The Gaussian or normal distribution with mean `$\mu \in \mathbb{R}$` and variance `$\sigma^2, \sigma>0$` has density

`$$x \mapsto \frac{1}{\sqrt{2 \pi} \sigma} \mathrm{e}^{- \frac{(x-\mu)^2}{2 \sigma^2}} \qquad\text{for } x \in \mathbb{R}$$`

The standard Gaussian density is defined by `$\mu=0, \sigma=1$`.

]

---

- Check that `$x \mapsto \frac{\mathrm{e}^{-x^2/2}}{\sqrt{2\pi}}$` is a probability density over `$\mathbb{R}$`.

- If `$X$` is distributed according to a standard Gaussian density, what is the distribution of `$\mu + \sigma X$`?

- If `$X$` is distributed according to a standard Gaussian density, show that

`$$\Pr \{ X > t \} \leq \frac{1}{t} \frac{\mathrm{e}^{-t^2/2}}{\sqrt{2\pi}} \qquad\text{for } t>0\,.$$`

---

### Gaussian densities parameters

- The _location parameter_ `$\mu$` coincides with the mean and the median.

- The _scale parameter_ `$σ$` is the standard deviation

- The Inter-Quartile-Range (IQR) is proportional to the standard deviation.

- If `$\Phi^{\leftarrow}$` denotes the quantile function of `$\mathcal{N}(0,1)$` then the interquartile range of `$\mathcal{N}(\mu, \sigma^2)$` is `$\sigma \Big(\Phi^{\leftarrow}(3/4) - \Phi^{\leftarrow}(1/4)\Big)=2 \sigma \Phi^{\leftarrow}(3/4)$`.

---

![(ref:witgetgauss)](cm-7-product-distributions_files/figure-html/witgetgauss-1.png)

---
class: middle, center, inverse

## Computing the density of an image probability distribution

---

### Problem

Assume

- we know the (a) density `$f$` of the distribution of some vector valued
random variable `$X$`, that `$f$` is positive on some open set `$U$`.

- we have a _smooth_ function `$\phi$` that maps `$U$` to `$V$`

---

💉 Recall the _change of variable formula_ in elementary calculus.

If `$\phi$` is monotone increasing and différentiable from open `$A \subseteq \mathbb{R}$` to `$B$`
and `$f$` is Riemann integrable over `$B$`, then

`$$\int_B f(y) \, \mathrm{d}y = \int_A f(\phi(x)) \, \phi^{\prime}(x) \, \mathrm{d}x$$`

---

### A multi-dimensional generalization of this elementary formula.

This extension is then used to establish an off-the-shelf formula for computing
the density of an image distribution

Let us start with a uni-dimensional warm-up <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 320 512"><path d="M208 96c26.5 0 48-21.5 48-48S234.5 0 208 0s-48 21.5-48 48 21.5 48 48 48zm94.5 149.1l-23.3-11.8-9.7-29.4c-14.7-44.6-55.7-75.8-102.2-75.9-36-.1-55.9 10.1-93.3 25.2-21.6 8.7-39.3 25.2-49.7 46.2L17.6 213c-7.8 15.8-1.5 35 14.2 42.9 15.6 7.9 34.6 1.5 42.5-14.3L81 228c3.5-7 9.3-12.5 16.5-15.4l26.8-10.8-15.2 60.7c-5.2 20.8.4 42.9 14.9 58.8l59.9 65.4c7.2 7.9 12.3 17.4 14.9 27.7l18.3 73.3c4.3 17.1 21.7 27.6 38.8 23.3 17.1-4.3 27.6-21.7 23.3-38.8l-22.2-89c-2.6-10.3-7.7-19.9-14.9-27.7l-45.5-49.7 17.2-68.7 5.5 16.5c5.3 16.1 16.7 29.4 31.7 37l23.3 11.8c15.6 7.9 34.6 1.5 42.5-14.3 7.7-15.7 1.4-35.1-14.3-43zM73.6 385.8c-3.2 8.1-8 15.4-14.2 21.5l-50 50.1c-12.5 12.5-12.5 32.8 0 45.3s32.7 12.5 45.2 0l59.4-59.4c6.1-6.1 10.9-13.4 14.2-21.5l13.5-33.8c-55.3-60.3-38.7-41.8-47.4-53.7l-20.7 51.5z"/></svg>

When starting from the uniform distribution on `$[0,1]$` and
applying a monotone differentiable transformation,  the density of the image measure
is easily computed.

- Let `$\phi$` be differentiable and increasing on `$[0,1]$`, and
let `$P$` be the uniform distribution on `$[0,1]$`.

Check  that `$P \circ \phi^{-1}$`
has density `$\frac{1}{\phi'\circ \phi^\leftarrow}$`  on `$\phi([0,1])$`.

---

.content-box-gray[

### Proposition

If the real valued random variable `$X$`  is distributed according to `$P$`
with density `$f$`, and `$\phi$` is monotone increasing and differentiable
over `$\operatorname{supp}(P)$`, then the probability distribution of
`$Y = \phi(X)$` has density

`$$g = \frac{f \circ \phi^{\leftarrow}}{\phi^{\prime}\circ \phi^{\leftarrow}}$$`

over `$\phi\big(\operatorname{supp}(P)\big)$`.

]

---

### Proof

By the fundamental theorem of calculus, the density `$f$` is a.e. the derivative of the cumulative distribution function `$F$`
of `$P$`.

The cumulative distribution function of `$Y=\phi(X)$` satisfies:

`\begin{align*}
P \Big\{ Y \leq y \Big\}
  & = P \Big\{ \phi(X) \leq y \Big\} \\
  & = P \Big\{ X \leq \phi^{\leftarrow} (y) \Big\} \\
  & = F \circ \phi^{\leftarrow}(y)
\end{align*}`

Almost everywhere, `$F \circ \phi^{\leftarrow}$` is differentiable, and has derivative `$\frac{f \circ \phi^{\leftarrow}}{\phi' \circ  \phi^{\leftarrow}}$` in `$\phi(\text{supp}(P))$`, `$0$` elsewhere.and

`$$P \Big\{ Y \leq y \Big\} = \int_{(-\infty, y] \cap \phi(\text{supp}(P))} \frac{f \circ \phi^{\leftarrow}(u)}{\phi' \circ  \phi^{\leftarrow}(u)} \mathrm{d}u$$`

---

.content-box-gray[

### Corollary

If the distribution of the real valued random variable `$X$` has density `$f$` then the distribution of  `$\sigma X + \mu$` has density `$\frac{1}{\sigma}f\Big(\frac{\cdot -\mu}{\sigma}\Big)$`

]

---

- In univariate calculus, it is easy to establish that if a function is  continuous and increasing over
an open set, it is invertible and its inverse is continuous and increasing

- If the function is differentiable with positive derivative, its inverse is also differentiable

- The differential and the differential of the inverse are related in transparent way

---

The Global Inversion Theorem extends the preceding  observation to the multivariate setting <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 416 512"><path d="M272 96c26.51 0 48-21.49 48-48S298.51 0 272 0s-48 21.49-48 48 21.49 48 48 48zM113.69 317.47l-14.8 34.52H32c-17.67 0-32 14.33-32 32s14.33 32 32 32h77.45c19.25 0 36.58-11.44 44.11-29.09l8.79-20.52-10.67-6.3c-17.32-10.23-30.06-25.37-37.99-42.61zM384 223.99h-44.03l-26.06-53.25c-12.5-25.55-35.45-44.23-61.78-50.94l-71.08-21.14c-28.3-6.8-57.77-.55-80.84 17.14l-39.67 30.41c-14.03 10.75-16.69 30.83-5.92 44.86s30.84 16.66 44.86 5.92l39.69-30.41c7.67-5.89 17.44-8 25.27-6.14l14.7 4.37-37.46 87.39c-12.62 29.48-1.31 64.01 26.3 80.31l84.98 50.17-27.47 87.73c-5.28 16.86 4.11 34.81 20.97 40.09 3.19 1 6.41 1.48 9.58 1.48 13.61 0 26.23-8.77 30.52-22.45l31.64-101.06c5.91-20.77-2.89-43.08-21.64-54.39l-61.24-36.14 31.31-78.28 20.27 41.43c8 16.34 24.92 26.89 43.11 26.89H384c17.67 0 32-14.33 32-32s-14.33-31.99-32-31.99z"/></svg>

.content-box-gray[

### Global Inversion Theorem

Let `$U$` and `$V$` be two non-empty open subsets of `$\mathbb{R}^d$`. Let `$\phi$` be a continuous bijective
from `$U$` to `$V$`. Assume furthermore that `$\phi$` is continuously differentiable, and that
`$D\phi_x$` is non-singular at every `$x \in U$`.

Then, the inverse function `$\phi^{\leftarrow}$` is also continuously differentiable on `$V$` and at every
`$y \in V$`:

`$$D\phi^{\leftarrow}_y = \Big(D\phi_{\phi^{\leftarrow}(y)} \Big)^{-1}$$`

]

The Jacobian determinant of `$\phi$` is the determinant of the matrix that represents the differential.
It is denoted by `$J_\phi$`. Recall that:

`$$J_{\phi^{\leftarrow}}(y) = \Big(J_{\phi}(\phi^{\leftarrow}(y)) \Big)^{-1}$$`

---

The multidimensional version of the change of variable formula
is stated under the same assumptions as the Global Inversion Theorem.
We admit the next Theorem.

.content-box-gray[

### Geometric change of variable formula

Let `$\ell$` denote the Lebesgue measure on `$\mathbb{R}^d$`.

For any  a non-negative  Borel-measurable function `$f$`:

`$$\int_U f(x) \mathrm{d}\ell(x)   = \int  f(\phi^\leftarrow(y)) \Big|J_{\phi^\leftarrow}(y) \Big| \mathrm{d}\ell(y)$$`

]

Moving from cartesian coordinates to polar/spherical coordinates
is easy thanks to an non-trivial application of the Geometricchange of variable formula

---

The Image density formula is a corollary of the geometric change of variable formula.

.content-box-gray[

### Image density formula

Let `$P$` have  density `$f$` over open `$U \subseteq  \mathbb{R}^d$`.

Let `$\phi$` be bijective fron `$U$` to `$\phi(U)$` and `$\phi$` be  continuously differentiable over `$U$` with non-singular differential.

The density `$g$` of the image distribution `$P \circ \phi^{-1}$` over `$\phi(U)$` is given by

`$$g(y) = f\big(\phi^\leftarrow(y)\big) \times \big|J_{\phi^\leftarrow}(y)\big|  =  f\big(\phi^\leftarrow(y)\big) \times \Big|J_{\phi}(\phi^\leftarrow(y))\Big|^{-1}$$`

]

---

The proof of Image density formula from Geometric change of variable formula is a routine
application of the transfer formula.

Let `$B$` be a Borelian subset of `$\phi(U)$`.

By the transfer formula:
`\begin{align*}
P\Big\{ Y \in B \Big\}
  & =  P\Big\{ \phi(X) \in B \Big\} \\
  & = \int_U \mathbb{I}_B(\phi(x)) f(x) \mathrm{d}\ell(x) \,.
\end{align*}`

Now, we invoke Geometric change of variable formula:

`\begin{align*}
\int_U \mathbb{I}_B(\phi(x)) f(x) \mathrm{d}\ell(x)
 & = \int_{\phi(U)} \mathbb{I}_B(\phi(\phi^\leftarrow(y))) f(\phi^\leftarrow(y)) \Big|J_{\phi^\leftarrow}(y)\Big| \mathrm{d}\ell(y) \\
 & = \int_{\phi(U)} \mathbb{I}_B(y) f(\phi^\leftarrow(y)) \Big|J_{\phi^\leftarrow}(y)\Big| \mathrm{d}\ell(y) \, .
\end{align*}`

This suffices to conclude that `$f\circ \phi^\leftarrow \Big|J_{\phi^\leftarrow}\Big|$` is a version of
the density of `$P \circ \phi^{-1}$` with respect to Lebesgue measure over `$\phi(U)$`.

---

## Application: Gamma-Beta calculus

The image density formula is applied to show a remarkable connexion between
Gamma  and Beta distributions.

.content-box-gray[

### Proposition

Let `$X, Y$` be independent random variables distributed according to
`$\Gamma(p, \lambda)$` and `$\Gamma(q, \lambda)$` (the intensity parameter are _equal_).

Let `$U = X+Y$` and  `$V= X/(X+Y)$`.

- `$U \perp \!\!\! \perp V$`
- `$U \sim \Gamma(p+q, \lambda)$`
- `$V \sim \operatorname{Beta}(p, q)$`.

]

---

### Proof

The mapping `$f: ]0, \infty)^2 \to ]0, \infty) \times ]0,1[$` defined by

`$$f(x,y) =  \Big(x+y, \frac{x}{x+y} \Big)$$`

is one-to-one with inverse `$f^{\leftarrow}(u,v) = \Big(uv,u(1-v)\Big)$`.

The Jacobian matrix of `$f^{\leftarrow}$` at `$(u,v)$` is

`$$\begin{pmatrix}
  v & u \\
  (1-v) & -u
\end{pmatrix}$$`

with determinant `$-uv -u +uv=-u$`.

---

The joint image density at `$(u,v) \in ]0,\infty) \times ]0,1[$` is

`\begin{align*}
& = \lambda^{p+q}\frac{(uv)^{p-1}}{\Gamma(p)} \frac{(u(1-v))^{q-1}}{\Gamma(q)}
\mathrm{e}^{-\lambda (uv + u(1-v))} u \\
& = \Big(\lambda^{p+q} \frac{u^{p+q-1}}{\Gamma(p+q)} \mathrm{e}^{\lambda u}\Big)
\times \Big(\frac{\Gamma(p+q)}{\Gamma(q)\Gamma(p)} v^{p-1} (1-v)^{q-1}\Big) \,.
\end{align*}`

---

The factorization of the joint density proves that  `$U \perp \!\!\! \perp V$`

We recognize that the density of (the distribution of) `$U$`
is the Gamma density with shape parameter `$p+q$`, intensity parameter `$\lambda$`.

The density of the distribution of `$V$` is the Beta density with parameters
`$p$` and `$q$`.

---

###

Assume `$X_1, X_2, \ldots, X_n$` form an  independent family with each `$X_i$`
distributed according to `$\Gamma(p_i, \lambda)$`.

Determine  the joint distribution of

`$$\sum_{i=1}^n X_i, \frac{X_1}{\sum_{i=1}^n X_i}, \frac{X_2}{\sum_{i=1}^n X_i}, \ldots, \frac{X_{n-1}}{\sum_{i=1}^n X_i}$$`

---
exclude: true
class: middle, center, inverse

# <svg style="height:0.8em;top:.04em;position:relative;fill:white;" viewBox="0 0 640 512"><path d="M192 384h192c53 0 96-43 96-96h32c70.6 0 128-57.4 128-128S582.6 32 512 32H120c-13.3 0-24 10.7-24 24v232c0 53 43 96 96 96zM512 96c35.3 0 64 28.7 64 64s-28.7 64-64 64h-32V96h32zm47.7 384H48.3c-47.6 0-61-64-36-64h583.3c25 0 11.8 64-35.9 64z"/></svg>

---
class: middle, center, inverse

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: 112%

# The End