Probability XI: Conditioning

name: inter-slide
class: left, middle, inverse

---
name: layout-general
layout: true
class: left, middle

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/Universite_Paris_logo_horizontal.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
template: inter-slide

# Conditioning

### 2021-11-25

#### [Probabilités Master I MIDS](http://stephane-v-boucheron.fr/courses/probability/)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
template: inter-slide

## <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 117.66v346.32c0 11.32 11.43 19.06 21.94 14.86L160 416V32L20.12 87.95A32.006 32.006 0 0 0 0 117.66zM192 416l192 64V96L192 32v384zM554.06 33.16L416 96v384l139.88-55.95A31.996 31.996 0 0 0 576 394.34V48.02c0-11.32-11.43-19.06-21.94-14.86z"/></svg>

### [Motivation](#motivcondexp)

### [Definition and elementary properties](#defcondexp)

### [Construction for `$X \in \mathcal{L}_2$`](#predictpoint)

### [Construction for `$X \in \mathcal{L}_1$`](#condexpl1)

### [Conditional probabilities](#condProbDistrib)

### [Conditional densities](#jointdensity)

### [Regular conditional probability kernels](#regconprob)

---
name: motivcondexp
class: inverse, center, middle

## Motivation: Defining conditional expectation

## <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm80 256c-60.6 0-134.5-38.3-143.8-93.3-2-11.8 9.3-21.6 20.7-17.9C155.1 330.5 200 336 248 336s92.9-5.5 123.1-15.2c11.3-3.7 22.6 6.1 20.7 17.9-9.3 55-83.2 93.3-143.8 93.3z"/></svg>

---

- `$(\Omega,\mathcal{F},P)$` is a probability space, and `$\mathcal{G} \subseteq \mathcal{F}$` a sub- `$\sigma$` -algebra.

- <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M569.517 440.013C587.975 472.007 564.806 512 527.94 512H48.054c-36.937 0-59.999-40.055-41.577-71.987L246.423 23.985c18.467-32.009 64.72-31.951 83.154 0l239.94 416.028zM288 354c-25.405 0-46 20.595-46 46s20.595 46 46 46 46-20.595 46-46-20.595-46-46-46zm-43.673-165.346l7.418 136c.347 6.364 5.609 11.346 11.982 11.346h48.546c6.373 0 11.635-4.982 11.982-11.346l7.418-136c.375-6.874-5.098-12.654-11.982-12.654h-63.383c-6.884 0-12.356 5.78-11.981 12.654z"/></svg> The sub- `$\sigma$` -algebra `$\mathcal{G}$` need not be _atomic_

- <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zM136 240c0-9.3 4.1-17.5 10.5-23.4l-31-9.3c-8.5-2.5-13.3-11.5-10.7-19.9 2.5-8.5 11.4-13.2 19.9-10.7l80 24c8.5 2.5 13.3 11.5 10.7 19.9-2.1 6.9-8.4 11.4-15.3 11.4-.5 0-1.1-.2-1.7-.2.7 2.7 1.7 5.3 1.7 8.2 0 17.7-14.3 32-32 32S136 257.7 136 240zm168 154.2c-27.8-33.4-84.2-33.4-112.1 0-13.5 16.3-38.2-4.2-24.6-20.5 20-24 49.4-37.8 80.6-37.8s60.6 13.8 80.6 37.8c13.8 16.5-11.1 36.6-24.5 20.5zm76.6-186.9l-31 9.3c6.3 5.8 10.5 14.1 10.5 23.4 0 17.7-14.3 32-32 32s-32-14.3-32-32c0-2.9.9-5.6 1.7-8.2-.6.1-1.1.2-1.7.2-6.9 0-13.2-4.5-15.3-11.4-2.5-8.5 2.3-17.4 10.7-19.9l80-24c8.4-2.5 17.4 2.3 19.9 10.7 2.5 8.5-2.3 17.4-10.8 19.9z"/></svg> We cannot define conditional probabilities by conditioning with respect to  atomic events generating `$\mathcal{G}$`

- <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M634.92 462.7l-288-448C341.03 5.54 330.89 0 320 0s-21.03 5.54-26.92 14.7l-288 448a32.001 32.001 0 0 0-1.17 32.64A32.004 32.004 0 0 0 32 512h576c11.71 0 22.48-6.39 28.09-16.67a31.983 31.983 0 0 0-1.17-32.63zM320 91.18L405.39 224H320l-64 64-38.06-38.06L320 91.18z"/></svg> Our objective is to define conditional expectations
with respect to  general sub- `$\sigma$` -algebra `$\mathcal{G}$`, while retaining the nice properties
surveyed in  the atomic context

---

### Example

`$X \sim \mathcal{N}(\mu, \Sigma)$` with `$\mu \in \mathbb{R}^k$` and covariance `$\Sigma$` a Semi-Definite Positive matrix

- Conditioning on `$\sigma\big(\Vert X\Vert\big)$`

- Conditionning on `$\sigma\big(X_1\big)$`

- Conditioning on `$\sigma\big(\langle v, X \rangle\big)$`

Events like `$\{\Vert X\Vert = y\}$`, `$\{\langle v, X \rangle = y \}$`  (generally) have
probability `$0$` for all `$y$`

---
name: defcondexp

The general _definition_ of conditional expectation starts from
what was considered a _property_ when conditioning with respect to atomic
`$\sigma$` -algebras

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Definition: Conditional expectation

- `$X \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$`

- `$\mathcal{G}$` : a sub- `$\sigma$` -algebra of `$\mathcal{F}$`,

then

a random variable `$Y$`  is a _version of the conditional expectation_ of
`$X$` with respect to  `$\mathcal{G}$`

iff

i. `$Y$` is `$\mathcal{G}$`-measurable.

ii. For every event  `$B$` in `$\mathcal{G}$`:

`$$\mathbb{E} \left[\mathbb{I}_B X \right] = \mathbb{E} \left[ \mathbb{I}_B Y \right]$$`

]

---

Leaving aside the question of the _existence_ of a version of conditional expectation
of `$X,$` we first check that:

if there exist  different versions, they differ only up to a negligible event

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

Let `$X \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$`   and  `$\mathcal{G}$` be a sub- `$\sigma$` -algebra of `$\mathcal{F}$`,

then

if `$Y'$` and `$Y$` are two versions of the conditional expectation of  `$X$`  with respect to  `$\mathcal{G}$`:

`$$P \left\{ Y = Y' \right\} = 1$$`

]

---

### Proof

As `$Y$` and  `$Y'$`  are `$\mathcal{G}$`-measurable, the event

`$$B = \left\{ \omega~:~Y(\omega) >Y'(\omega)\right\}$$`

belongs to `$\mathcal{G}.$`

`$$\mathbb{E}\left[\mathbb{I}_B\, X\right]=  \mathbb{E}\left[\mathbb{I}_B \, Y\right] =  \mathbb{E}\left[\mathbb{I}_B \, Y'\right]$$`

Thus `$\mathbb{E}\left[\mathbb{I}_B (Y-Y') \right] = 0$`

As random variable `$\mathbb{I}_B \times (Y-Y')$` is non-negative with
expectation zero, it is null with probability `$1$`

Thus `$P(B) = P \{Y>Y'\}=0$`

We can proceed in a similar way for event `$\{Y<Y'\}$`

---

Still postponing the _existence_ question, let us check now a few properties
versions of conditional expectation of `$X$` (if they exist) should satisfy

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition (Linearity of Conditional Expectation)

- `$X_1, X_2 \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$`,

- `$\mathcal{G}$` a sub- `$\sigma$` -algebra of  `$\mathcal{F}$`,

- `$a_1, a_2 \in \mathbb{R}$`

then

if  `$Y_1$`, `$Y_2$`,  and  `$Z$`  are respectively  versions of `$\mathbb{E} [X_1\mid \mathcal{G}], \mathbb{E} [X_2\mid \mathcal{G}]$`  and  `$\mathbb{E} [a_1 X_1 + a_2 X_2\mid \mathcal{G}]$` with respect to  `$\mathcal{G}$`

`$$P\{a_1 Y_1 + a_2 Y_2 = Z\} = 1$$`

]

---

### Proof

Let  `$B$` be the event  of `$\mathcal{G}$`  defined by

`$$\{ a_1 Y_1 + a_2 Y_2 > Z\}$$`

We get

`$$\begin{array}{rcl}
\mathbb{E}
[\mathbb{I}_B Z] & = &  \mathbb{E} [\mathbb{I}_B (a_1 X_1 + a_2 X_2)]  \\
& = & a _1 \mathbb{E} [\mathbb{I}_B  X_1 ]+a_2 \mathbb{E} [\mathbb{I}_B X_2] \\
& = & a_1  \mathbb{E} [\mathbb{I}_B  Y_1 ]+a_2 \mathbb{E}
[\mathbb{I}_B Y_2] \\
& = & \mathbb{E} [\mathbb{I}_B (a_1 Y_1 + a_2 Y_2)]\end{array}$$`

and thus

`$$\mathbb{E}[\mathbb{I}_B (Z-(a_1 Y_1 + a_2 Y_2))]=  0$$`

We conclude as in the preceding proof that  `$P\{B\}=0.$`

The proof is completed by handling in a similar way the event `$\{ a_1 Y_1 + a_2 Y_2 < Z\}$`

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition  (Monotony of Conditional Expectation)

- `$X \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$`

- `$\mathcal{G}$` is a sub- `$\sigma$` algebra of  `$\mathcal{F}$`

- `$Z$` is a version the conditional expectation of  `$X$` with respect to  `$\mathcal{G}$`

- `$X$` is  `$P$`-a.s. non-negative,

then

`$$P\{Z \geq 0\} =1$$`

The proof reproduces the argument used to established that
different versions of the conditional expectation are almost surely equal.

]

---

### Proof

For `$n \in \mathbb{N}$`, let   `$B_n$` denote the event (from
`$\mathcal{G}$`) defined by

`$$B_n = \left\{ \mathbb{E} \left[ X \mid \mathcal{G} \right] < - \frac{1}{n} \right\}$$`

To prove the proposition, it is enough to check

`$$P \left\{ \cup_n B_n \right\} = 0$$`

As `$P \left\{ \cup_n B_n \right\} = \lim_n P\{B_n \}$`,  it suffices to check
`$P \left\{ B_n\right\} = 0.$` For all  `$n$`,

`$$\begin{array}{rl}
  0
  & \leq \mathbb{E}\big[\mathbb{I}_{B_n} X\big] \\
  & = \mathbb{E} \left[ \mathbb{I}_{B_n} X \right] \\
  & = \mathbb{E} \left[ \mathbb{I}_{B_n}  \mathbb{E} \left[ X \mid \mathcal{G} \right] \right] \\
  & \leq  - \frac{P\{B_n \}}{n} \, .
\end{array}$$`

Hence, for all `$n$`,   `$P\{B_n \}= 0$`.

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Corollary

If `$(X_n)_{n \in \mathbb{N}}$` is a sequence of random variables from
`$\mathcal{L}_1 (\Omega, \mathcal{F}, P)$` satisfying
`$X_{n + 1} \geq X_n$` `$P$`-a.s.

then

there exists an `$P$` -a.s. non-decreasing sequence of versions
of conditional expectations

`$$\forall n \in \mathbb{N},\quad\mathbb{E} \left[ X_{n + 1} \mid \mathcal{F} \right] \geq \mathbb{E} \left[ X_n \mid \mathcal{F} \right]$$`

]

---

Let  `$\mathcal{E}$` be a  `$\pi$` -system generating `$\mathcal{G}$` and containing
`$\Omega$`

Check that `$\mathbb{E} \left[ X \mid \mathcal{G} \right]$` is the unique
element from `$\mathcal{L}_1 \left( \Omega, \mathcal{G}, P\right)$` which satisfies

`$$\forall B \in \mathcal{E}, \quad \mathbb{E} \left[
\mathbb{I}_B X \right] = \mathbb{E} \left[ \mathbb{I}_B
\mathbb{E} \left[ X \mid \mathcal{G} \right] \right]$$`

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition  (Tower Property)

Let  `$(\Omega, \mathcal{F}, P)$` be a probability space, and
`$$\mathcal{G} \subseteq \mathcal{H} \subseteq \mathcal{F}$$`
be  two nested sub- `$\sigma$` -algebras.

Then for every   `$X \in \mathcal{L}_1(\Omega, \mathcal{F}, P)$`:

`$$\begin{array}{rl}\mathbb{E} \Big[ \mathbb{E}\left[ X \mid \mathcal{G} \right] \mid \mathcal{H} \Big]
& = \mathbb{E} \left[ X \mid \mathcal{G} \right] \\
\mathbb{E} \Big[ \mathbb{E} \, \left[ X \mid \mathcal{H} \right] \mid \mathcal{G} \Big]
& = \mathbb{E} \left[ X \mid \mathcal{G} \right]\end{array}$$`

]

???

The smallest `$\sigma$`-algebra takes it all

---

### Proof

The second equality is trivial:

> any `$\mathcal{G}$`-measurable random variable is also  `$\mathcal{H}$`-measurable.

To check the first equality: for every `$B \in \mathcal{G}$`,
`$$\begin{array}{rl}
\mathbb{E} \left[ \mathbb{I}_B  \mathbb{E} \left[ \mathbb{E}
\left[ X \mid \mathcal{H} \right] \mid \mathcal{G} \right] \right] & =
\mathbb{E} \left[\mathbb{E}\left[  \mathbb{I}_B \mathbb{E}
\left[  X \mid \mathcal{H} \right] \mid \mathcal{G}\right]\right] \quad \text{as } B \in \mathcal{G} \\
& = \mathbb{E} \left[ \mathbb{I}_B \mathbb{E} \left[ X \mid
\mathcal{H} \right] \right] \quad \text{averaging out}\\
& =  \mathbb{E} \left[ \mathbb{I}_B X \right] \quad \text{as } B \in \mathcal{H}
\end{array}$$`

---
class: inverse, center, middle
name: predictpoint

## Conditional expectation in `$\mathcal{L}_2(\Omega, \mathcal{F}, P)$`

---

If we focus on square-integrable random variables, building versions of conditional expectation turn out to be easy

When the conditioning  `$\sigma$` sub-algebra `$\mathcal{G}$` is atomic,
the condition expectation `$\mathbb{E}[X \mid \mathcal{G}]$` defines an optimal predictor of `$X$` with respect to quadratic error
amongst all `$\mathcal{G}$` measurable random variables

This characterization remains valid for square integrable random variables
even when the conditioning  `$\sigma$` sub-algebra is no more atomic <svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg>

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem (Conditional expectation for square-integrable random variables)

Let be `$X \in \mathcal{L}_2 (\Omega, \mathcal{F}, P)$` and
`$\mathcal{G}$` a sub- `$\sigma$` -algebra of `$\mathcal{F}$`.

`$$\exists Y \in \mathcal{L}_2(\Omega, \mathcal{F}, P) \qquad \mathbb{E}(Y-X)^2 =  \inf \Big\{ \mathbb{E}(Z-X)^2 : Z \in \mathcal{L}_2(\Omega, \mathcal{G}, P) \Big\}$$`

A version  `$Y$` of the _orthogonal projection_ of `$X$` on `$\mathcal{L}_2(\Omega, \mathcal{G}, P)$` is also a version of the conditional expectation of `$X$` with respect to `$\mathcal{G}$`:

`$$\forall B \in \mathcal{G}, \quad
\mathbb{E} \left[\mathbb{I}_B X \right] = \mathbb{E} \left[ \mathbb{I}_B\, Y \right]$$`

]

---

i. there exists a minimizer of `$\mathbb{E}(X-Z)^2$` in
`$\mathcal{L}_2(\omega, \mathcal{F}, P)$`,

ii. such a minimizer is a version of the conditional expectation

Checking the first statement amounts to invoke the right arguments from Hilbert spaces theory.

---

.bg-light-gray.br3.shadow-5.ph4.mt5[

### Definition Hilbert space

A real vector space `$E$` equipped with a norm `$\|\cdot\|$` is a Hilbert space

iff

`$\langle \cdot, \cdot \rangle$` defined by

`$$\forall x, y \in E, \langle x, y \rangle = \frac{1}{4} \Big(\Vert x+y \Vert^2 + \Vert x-y \Vert^2\Big)$$`

is an _inner product_

and

`$E$` is _complete_ for the topology induced by the norm

]

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

Let `$(\Omega, \mathcal{F}, P)$` be a probability space,

then

the set `$L_2(\Omega, \mathcal{F}, P)$` of equivalence classes of square integrable variables,
equipped with `$\Vert X\Vert^2=  (\mathbb{E} X^2)^{1/2}$` is a Hilbert space

]

---

`$$\langle X, Y \rangle  = \mathbb{E}\left[ XY \right]$$`

From Hilbert space theory, the essential tool we shall use is the
projection Theorem below.

---

Our starting point is the next observation

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

- Let `$(\Omega, \mathcal{F}, P)$` be a probability space,

- Let `$\mathcal{G} \subseteq \mathcal{F}$`  be a sub- `$\sigma$` -algebra,

then

`$L_2(\Omega, \mathcal{G}, P)$`  is a _closed_, _convex_ subset (subspace) of `$L_2(\Omega, \mathcal{F}, P)$`.

]

---

We look for the element from `$L_2(\Omega, \mathcal{G}, P)$` that is closest (in the `$L_2$` sense)
to a random variable from `$L_2(\Omega, \mathcal{F}, P)$`.

The existence and unicity of
this closest `$\mathcal{G}$`-measurable random variable are warranted by the Projection Theorem.

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem: Hilbert space Projection Theorem

Let `$E$` be a Hilbert space and `$F$` a _closed convex_ subset of
`$F$`.

For every `$x \in E$`, there _exists a unique_ `$y \in F$`, such that

`$$\|x - y\|= \inf_{z \in F} \|x - z\|$$`

This unique closest point in `$F$` is called the _(orthogonal) projection_ of `$x$` over `$F$`

For any `$z \in F$`,

`$$\langle x-y, z-y\rangle \leq 0$$`

If `$F$` is a linear subspace of `$E$`, the Pythagorean relationship holds:

`$$\|x\|^2 =  \|y\|^2 + \|x - y\|^2$$`

and for any `$z \in F$`, `$\langle x - y, z\rangle =0$`

]

---

### Proof

Let `$d = \inf_{z \in F} \|x - z\|$`.
Let  `$(z_n)_n$` be a sequence of elements from `$F$`  such that

`$$\lim_n \|x - z_n \|= d$$`

According to the parallelogram law,

`$$2 \left( \|x - z_n \|^2 +\|x - z_m \|^2 \right)  = \|2 x - (z_n + z_m)\|^2 + \|z_n - z_m \|^2 .$$`

Since `$F$` is convex, `$(z_n + z_m) / 2 \in F$`, so

`$$\|x - (z_n + z_m) / 2\| \geq d$$`

Let `$\epsilon \in (0, 1]$` and  `$n_0$` be such that
for `$n \geq n_0$`,  `$\|x - z_n \| \leq d + \epsilon .$`
For `$n, m \geq n_0$`

`$$4 (d + \epsilon)^2 \geq 4 d^2 +\|z_n - z_m \|^2$$`

or equivalently

`$$\|z_n - z_m \|^2 \leq 4 (2 d + 1) \epsilon$$`

---

### Proof (continued)

Hence, the minimizing sequence `$(z_n)_n$` has the Cauchy property. As `$F$`
is _closed_, it has a unique limit `$y \in F$` and `$d  = \|x - y\|$`.

To verify uniqueness, suppose there exists `$y' \in F$`, such as
`$\|x - y' \|= d$`. Now,  let us build a new sequence `$(z'_n)_{n \in \mathbb{N}}$`
such that `$z'_{2 n} = z_n$` and `$z'_{2 n + 1} = y'$`.

This `$F$`-valued
sequence satisfies `$\lim_n \|z'_n - x\|= d.$` By the argument above, it admits
a  limit `$y^{\prime\prime}$` in `$F$`.

The limit `$y^{\prime\prime}$` coincides with the
limit of any sub-sequence, so it equals  `$y$` and `$y'.$`

Fix `$z \in F \setminus \{y\}$`, for any `$u \in (0,1]$`, let `$z_u = y + u (z-y)$`, then
`$z_u \in F$` and

`$$\Vert  x - z_u\Vert^2 - \Vert x -y \Vert^2 = -2 u \langle x-y, z-y \rangle +  u^2 \Vert z - y \Vert^2$$`

As this quantity is non-negative for `$u \in [0,1]$`, `$\langle x-y, z-y \rangle$` has to be non-positive

---

### Proof (continued)

Now suppose that `$F$` is a _closed_ subspace of `$E.$`

If there is `$y \in F$` such as `$\langle x - y, z \rangle = 0$` for any `$z\in F$`,
then `$y$` is the orthogonal projection of `$x$` on `$F$` since for all `$z \in F$`:

`$$\begin{array}{rl}
\|x - z\|^2
  & =  \|x - y\|^2 - 2 \langle x - y, z \rangle +\|z\|^2\\
  & \geq  \|x - y\||^2 .
\end{array}$$`

Conversely, if `$y$` is the orthogonal projection of `$x$` on `$F$`, for
all `$z$` of `$F$` and all `$\lambda \in \in \mathbb{R}$`:

`$$\begin{array}{rl}
\|x - y\||^2
  & \leq  \|x - (y + \lambda z)\|^2 \\
  & =  \|x - y\|^2 - 2 \lambda \langle x - y, z \rangle + \lambda^2 \|z\|^2,
\end{array}$$`

so `$0 \leq 2 \lambda \langle x - y, z \rangle + \lambda^2 \|z\|^2$`

For this polynomial in `$\lambda$` to be of constant sign, it is necessary that
`$\langle x - y, z \rangle = 0$`

---

As `$\mathcal{L}_2 (\Omega, \mathcal{G}, P)$` is a closed convex subset
of `$\mathcal{L}_2 (\Omega, \mathcal{F}, P)$`, the existence and uniqueness of the
projection on a closed convex part of a Hilbert space
gives the following corollary

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Corollary

Given `$X \in \mathcal{L}_2 (\Omega, \mathcal{F}, P)$` and
`$\mathcal{G}$` a sub- `$\sigma$` -algebra of `$\mathcal{F}$`,

there exists  `$Y \in \mathcal{L}_2 (\Omega, \mathcal{G}, P)$`  that  minimizes

`$$\mathbb{E} \left[ \left( X - Z \right)^2 \right] \qquad \text{ for } Z \in \mathcal{L}_2 (\Omega, \mathcal{G}, P)$$`

Any other minimizer in `$\mathcal{L}_2 (\Omega, \mathcal{G}, P)$`  is `$P$`-almost surely equal to `$Y$`

]

---

### Proof

Let  `$Y$` be a version of the orthogonal projection of `$X$` on
`$L_2(\Omega,\mathcal{G},P)$` and `$B$` an element of `$\mathcal{G}.$`

The inner product of `$\mathbb{I}_B \in \mathcal{L}_2(\Omega,\mathcal{G},P)$`) and `$X-Y$` is

`$$\langle X-Y, \mathbb{I}_B \rangle = \mathbb{E}\left[(X-Y)\mathbb{I}_B\right]$$`

By the Projection Theeorem, `$\mathbb{E}\left[(X-Y)\mathbb{I}_B\right]=0$`.

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Definition Conditional variance

Let `$X \in \mathcal{L}_2(\Omega, \mathcal{F}, P)$`
and `$\mathcal{G} \subseteq \mathcal{F}$` a sub- `$\sigma$` -algebra.

The *conditional variance* of `$X$` with respect to  `$\mathcal{G}$` is defined by

`$$\operatorname{Var} \left[ X \mid \mathcal{G} \right] = \mathbb{E} \left[\left( X - \mathbb{E} [X \mid \mathcal{G}] \right)^2 \mid \mathcal{G}\right]$$`

]

The conditional variance is a  `$\mathcal{G}$` -measurable random variable, just as
the conditional expectation

It is the conditional expectation of the prediction error that is incurred when trying to predict `$X$`
using `$\mathbb{E}[X \mid \mathcal{G}]$`.

???

a Pythagorean  theorem for the variance.

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition  (Pythagorean identity)

Let `$X \in \mathcal{L}_2(\Omega, \mathcal{F}, P)$`
and `$\mathcal{G} \subseteq \mathcal{F}$` a sub- `$\sigma$` -algebra.

Then

`$$\operatorname{Var} [X] = \operatorname{Var} \Big[ \mathbb{E} \left[ X \mid \mathcal{G} \right] \Big] + \mathbb{E} \Big[ \operatorname{Var} \left[ X \mid \mathcal{G} \right] \Big]$$`

]

---

- `$X - \mathbb{E}X = \underbrace{X - \mathbb{E}[X \mid \mathcal{G}]}_{\text{orthogonal to any } \mathcal{G} \text{-measurable}}  + \quad \underbrace{\mathbb{E}[X \mid \mathcal{G}] - \mathbb{E}X}_{\mathcal{G} \text{-measurable}}$`

- `$\operatorname{Var}\left(X\right) =  \mathbb{E}\left[\left(X - \mathbb{E}X\right)^2\right]$`

- `$\mathbb{E}\left[\operatorname{Var}[X \mid\mathcal{G}]\right] = \mathbb{E}\left[\left(X - \mathbb{E}[X  \mid \mathcal{G}]\right)^2\right]$`

- `$\operatorname{Var}\left(\mathbb{E}[X \mid \mathcal{G}]\right) = \mathbb{E}\left[ \left(\mathbb{E}[X \mid \mathcal{G}] - \mathbb{E}X\right)^2 \right]$`

---

### Proof

Recall that  `$\mathbb{E} \left[ \mathbb{E} \left[ X \mid \mathcal{G} \right] \right] = \mathbb{E} \left[ X \right]$`.

`$$\begin{array}{rl}
\operatorname{Var} \left[ X \right] & =  \mathbb{E} \left[ \left( X -
\mathbb{E} \left[ X \right] \right)^2 \right]\\
& =  \mathbb{E} \left[ \left( X - \mathbb{E} \left[ X \mid
\mathcal{G} \right] + \mathbb{E} \left[ X \mid \mathcal{G} \right] -
\mathbb{E} \left[ X \right] \right)^2 \right]\\
& =  \mathbb{E} \left[ \left( X - \mathbb{E} \left[ X \mid
\mathcal{G} \right] \right)^2 \right]\\
&  \qquad + 2 \mathbb{E} \left[ \left( X - \mathbb{E} \left[ X \mid
\mathcal{G} \right] \right) \left( \mathbb{E} \left[ X \mid \mathcal{G}
\right] - \mathbb{E} \left[ X \right] \right) \right]\\
&  \qquad + \mathbb{E} \left[ \left( \mathbb{E} \left[ X \mid
\mathcal{G} \right] - \mathbb{E} \left[ X \right] \right)^2 \right]\\
& =  \mathbb{E} \left[ \mathbb{E} \left[ \left( X - \mathbb{E}
\left[ X \mid \mathcal{G} \right] \right)^2 \mid \mathcal{G} \right]
\right]\\
&  \qquad + 2 \mathbb{E} \Big[ \mathbb{E} \left[ \left( X -
\mathbb{E} \left[ X \mid \mathcal{G} \right] \right) \mid \mathcal{G}
\right]  \left( \mathbb{E} \left[ X \mid \mathcal{G} \right] -
\mathbb{E} \left[ X \right] \right) \Big]\\
&  \qquad + \operatorname{Var} \left[ \mathbb{E} \left[ X \mid \mathcal{G}
\right] \right]\\
& =  \mathbb{E} \left[ \operatorname{Var} \left[ X \mid \mathcal{G} \right]
\right] + \operatorname{Var} \left[ \mathbb{E} \left[ X \mid \mathcal{G}
\right] \right] .
\end{array}$$`

---
class: inverse, center, middle
name: condexpl1

## Conditional expectation in `$\mathcal{L}_1 (\Omega, \mathcal{F}, P)$`

---
name: kolmoespcond

To construct the conditional expectation of a random variable,
square-integrability is not necessary.

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

If `$Y \in \mathcal{L}_1 (\Omega, \mathcal{F},P)$`,

then

there exists  an integrable  `$\mathcal{G}$`-measurable random variable,
denoted by `$\mathbb{E} \left[ Y \mid \mathcal{G} \right]$` such that

`$$\forall B \in \mathcal{G}, \mathbb{E} \left[ \mathbb{I}_B Y \right] =\mathbb{E} \left[ \mathbb{I}_B \mathbb{E} \left[ Y \mid \mathcal{G}\right] \right]$$`

]

---

`$$\forall B \in \mathcal{G}', \mathbb{E} \left[ \mathbb{I}_B Y \right]= \mathbb{E} \left[ \mathbb{I}_B \mathbb{E} \left[ Y \mid\mathcal{G} \right] \right]$$`

then `$Z = \mathbb{E} \left[ Y \mid \mathcal{G} \right]$`

---

To establish the theorem, we use the _usual machinery_ of limiting arguments.

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

If `$(Y_n)_n$` is a non-decreasing sequence of  non-negative
square-integrable random variables such as `$Y_n \uparrow Y$` a.s.

then

there exists a `$\mathcal{G}$`-measurable random variable  `$Z$` such that

`$$\mathbb{E} \left[ Y_n \mid \mathcal{G} \right] \uparrow Z \qquad \text{a.s.}$$`

]

---

### Proof

As `$(Y_n)_n$` is non-decreasing,
`$\left( \mathbb{E} \left[ Y_n \mid \mathcal{G} \right] \right)_n$` is an
(a.s.) non-decreasing sequence  of `$\mathcal{G}$`-measurable random variables,

It admits a  `$\mathcal{G}$`-measurable limit (finite or not)

---

We now proceed to the proof of Theorem

### Proof

Without losing in generality, we  assume `$Y \geq 0$`

> if this is not the case, let `$Y = (Y)_+ - (Y)_-$` with
> `$(Y)_+ = |Y| \mathbb{I}_{Y\geq 0}$` and `$(Y)_- = |Y| \mathbb{I}_{Y < 0}$`, handle `$(Y)_+$` and `$(Y)_-$`
> separately and  use the linearity of conditional expectation

Let

`$$Y_n = Y \mathbb{I}_{|Y|Y| \leq n}$$`

so that  `$Y_n \nearrow Y$` everywhere.

The random variable `$Y_n$` is bounded
and thus square-integrable. The random variable  `$\mathbb{E}\left[ Y_n \mid \mathcal{G} \right]$`
is therefore well defined for each  `$n$`

The sequence `$\mathbb{E} \left[ Y_n \mid \mathcal{G} \right]$`
is  `$P$`-a.s. monotonous.

It  converges monotonously towards a `$\mathcal{G}$`-measurable
random variable `$Z$` which takes values in `$\mathbb{R}_+ \cup \{\infty\}$`.

We need to check that this random variable
`$Z \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$`.

---

### Proof (continued)

By monotonous convergence:

`$$\begin{array}{rl}
\mathbb{E} Y
& = \mathbb{E}\big[ \lim_n  \uparrow Y_n\big] \\
& = \lim_n  \uparrow \mathbb{E}\big[  Y_n\big] \\
& = \lim_n  \uparrow \mathbb{E} \Big[ \mathbb{E} \left[ Y_n \mid \mathcal{G} \right] \Big]  \\
& = \mathbb{E} \Big[ \lim_n \uparrow  \mathbb{E} \left[ Y_n \mid \mathcal{G} \right] \Big] \\
& = \mathbb{E} Z\end{array}$$`

---

### Proof (continued)

If `$A \in \mathcal{G}$`, by monotonous convergence,

`$$\lim_n \uparrow \mathbb{E} \left[ \mathbb{I}_A Y_n \right] = \mathbb{E}  \left[ \mathbb{I}_A Y \right]$$`

and so

`$$\lim_n \uparrow  \mathbb{E} \left[ \mathbb{I}_A \mathbb{E} \left[ Y_n \mid\mathcal{G} \right] \right] = \mathbb{E} \left[ \mathbb{I}_A Y \right]$$`

By monotonous convergence again:

`$$\lim_n \uparrow  \mathbb{E} \left[ \mathbb{I}_A \lim_n \mathbb{E} \left[ Y_n \mid\mathcal{G}\right] \right] = \mathbb{E} \left[ \mathbb{I}_A Z \right]$$`

---
class: inverse, center, middle
name: propcondexp

## Properties of (general) conditional expectation

---

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M569.517 440.013C587.975 472.007 564.806 512 527.94 512H48.054c-36.937 0-59.999-40.055-41.577-71.987L246.423 23.985c18.467-32.009 64.72-31.951 83.154 0l239.94 416.028zM288 354c-25.405 0-46 20.595-46 46s20.595 46 46 46 46-20.595 46-46-20.595-46-46-46zm-43.673-165.346l7.418 136c.347 6.364 5.609 11.346 11.982 11.346h48.546c6.373 0 11.635-4.982 11.982-11.346l7.418-136c.375-6.874-5.098-12.654-11.982-12.654h-63.383c-6.884 0-12.356 5.78-11.981 12.654z"/></svg>

In this Section
`$(\Omega, \mathcal{F}, P)$` is a probability space, `$\mathcal{G}$` is a sub- `$\sigma$` -algebra  of
`$\mathcal{F}$`.

Random variables  `$(X_n)_n, (Y_n)_n, X, Y, Z$` are meant to be integrable,
and a.s. means `$P$`-a.s.

---

The easiest property is:

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

If `$X \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$`

then

`$$\mathbb{E} \left[ X \right] = \mathbb{E} \left[ \mathbb{E} \left[ X \mid \mathcal{G} \right] \right]$$`

]

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

If `$X \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$` and `$X$` is
`$\mathcal{G}$`-measurable

then

`$$X = \mathbb{E} \left[ X \mid \mathcal{G} \right]  \hspace{1em} P   \text{-a.s.}$$`

]

---

### An alternative characterization of conditional expectation

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

Let  `$X \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$`
and  `$\mathcal{G} \subseteq \mathcal{F}$` be a sub- `$\sigma$` -algebra,

then

for every  `$Y \in\mathcal{L}_1 (\Omega, \mathcal{G}, P)$`, such that `$\mathbb{E} \left[ |XY| \right] < \infty$`

`$$\mathbb{E} \left[ XY \right] = \mathbb{E} \left[ Y \mathbb{E}   \left[ X \mid \mathcal{G} \right] \right]$$`

]

---

We pocket the next proposition for future and frequent use. We could go ahead
with listing many other useful properties of conditional expectation.

They are best discovered and established when needed.

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

If `$X, Y \in \mathcal{L}_1 (\Omega, \mathcal{F}, P)$` and  `$Y$` is
`$\mathcal{G}$` -measurable

then

`$$\mathbb{E} \left[ XY \mid \mathcal{G} \right] = Y \mathbb{E} \left[   X \mid \mathcal{G} \right]  \hspace{1em} P \text{-a.s.}$$`

]

---

### Proof

As  `$Y \mathbb{E} \left[ X \mid \mathcal{G} \right]$` is
`$\mathcal{G}$` -measurable, it  suffices to check that for every  `$B \in \mathcal{G}$`,

`$$\mathbb{E} \left[ \mathbb{I}_B XY \right] = \mathbb{E} \left[\mathbb{I}_B \left( Y \mathbb{E} \left[ X \mid \mathcal{G} \right]\right) \right]$$`

But
`$$\begin{array}{rcl}
\mathbb{E} \left[ \mathbb{I}_B XY \right] & = & \mathbb{E} \left[
( \mathbb{I}_B Y) X \right]\\
& = & \mathbb{E} \left[ ( \mathbb{I}_B Y) \mathbb{E} \left[ X
\mid \mathcal{G} \right] \right]\\
& = & \mathbb{E} \left[ \mathbb{I}_B \left( Y \mathbb{E} \left[ X
\mid \mathcal{G} \right] \right) \right] .
\end{array}$$`

---
class: inverse, center, middle
name: condconvtheorems

## Conditional convergence theorems

---

Limit theorems from integration theory  (monotone convergence theorem,
Fatou's Lemma, Dominated  convergence theorem) can be adapted to
the conditional expectation setting.

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem Conditional Monotone convergence

Let the sequence `$(X_n)_n$` of non-negative random variables converge monotonously to `$X$` ( `$X_n \uparrow X$` a.s.), with `$X$`
integrable,

then

for every sequence of versions of conditional expectations:

`$$\lim_n \uparrow \mathbb{E} \left[ X_n \mid \mathcal{G} \right] = \mathbb{E}  \left[ X \mid \mathcal{G} \right]  \text{ a.s.}$$`

]

---

### Proof

The sequence `$X - X_n$` is non-negative and decreases to `$0$` a.s.

It suffices to show that  `$\lim_n \downarrow \mathbb{E} \left[ X - X_n \mid \mathcal{G} \right] = 0$` a.s.

Note first that the sequence
`$\mathbb{E} \left[ X - X_n \mid \mathcal{G} \right]$` converges a.s. toward
a non-negative  limit.

We need to check that this limit is a.s. zero.

For  `$A \in \mathcal{G}$` :

`$$\begin{array}{rl}
\mathbb{E} \left[ \mathbb{I}_A \lim_n \mathbb{E} \left[ X - X_n
\mid \mathcal{G} \right] \right] & =  \lim_n \mathbb{E} \left[
\mathbb{I}_A  \mathbb{E} \left[ X - X_n \mid \mathcal{G} \right]
\right]\\
&  \qquad \text{ monotone convergence theorem}\\
& = \lim_n \mathbb{E} \left[ \mathbb{I}_A \left( X_n - X \right)
\right]\\
&  \qquad \text{ monotone convergence theorem}\\
& =  0 \, .
\end{array}$$`

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem (Conditional Fatou's Lemma)

Let `$(X_n)_n$` be a sequence of non-negative random variables, then

`$$\mathbb{E} \left[ \liminf_n X_n \mid \mathcal{G} \right] \leq
\liminf_n \mathbb{E} \left[ X_n \mid \mathcal{G} \right]  \hspace{1em}
\text{a.s.}$$`

]

As for the proof of Fatou's Lemma, the argument boils down
to monotone convergence arguments.

---

### Proof

Let `$B \in \mathcal{G}$`. Let  `$X = \liminf_n X_n$`, `$X$`
is a non-negative random variable.

Let  `$Y = \liminf_n \mathbb{E} \left[ X_n \mid \mathcal{G} \right]$`,

`$Y$` is a `$\mathcal{G}$` -measurable integrable random variable.

Let `$Z_k = \inf_{n \geq k} X_n$`.

Thus `$\lim_k \uparrow Z_k =  \liminf_n X_n = X$`.

According to  the  .ttc[conditional monotone convergence theorem]

`$$\mathbb{E} \left[ Z_k \mid \mathcal{G} \right] \uparrow_k \mathbb{E} \left[ \liminf_n X_n \mid \mathcal{G} \right]\text{ a.s.}$$`

---

### Proof (continued)

For every `$n \geq k$`, `$X_n \geq Z_k$` a.s. Hence by the comparison Theorem,

`$$\forall n \geq k \hspace{1em} \mathbb{E} \left[ Z_k \mid\mathcal{G} \right] \leq \mathbb{E} \left[ X_n \mid \mathcal{G} \right]   \text{ a.s.}$$`

as a countable union of  `$P$`-negligible events is
`$P$`-negligible. Hence for every `$k$`,

`$$\mathbb{E} \left[ Z_k \mid \mathcal{G} \right] \leq \liminf_n \mathbb{E} \left[ X_n \mid \mathcal{G} \right]  \hspace{1em}\text{a.s.}$$`

This entails

`$$\lim_k \uparrow \mathbb{E} \left[ Z_k \mid \mathcal{G} \right] \leq\liminf_n  \mathbb{E} \left[ X_n \mid \mathcal{G} \right]\quad  \text{ a.s.}$$`

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem (Conditional Dominated convergence)

Let `$V \in \mathcal{L}_1(\Omega, \mathcal{F}, P)$`.
Let  sequence `$(X_n)_n$` satisfy  `$|X_n | \leq V$` for every `$n$` and
 `$X_n \rightarrow X \text{a.s.}$`,

then

for any sequence of versions of conditional expectations
of  `$(X_n)_n$` and  `$X$`

`$$\mathbb{E} \left[ X_n \mid \mathcal{G} \right] \rightarrow \mathbb{E} \left[ X \mid \mathcal{G} \right]  \hspace{1em} \text{a.s.}$$`

]

---

### Proof

Let  `$Y_n = \inf_{m \geq n} X_m$` and  `$Z_n = \sup_{m \geq n} X_m$`.
Hence  `$-V \leq Y_n \leq Z_n \leq V$`. As `$Y_n \uparrow X$` and `$Z_n \downarrow X$`.

By the conditional monotone convergence Theorem,
`$\mathbb{E} \left[ Y_n \mid \mathcal{G} \right] \uparrow\mathbb{E} [X \mid \mathcal{G}]$` and  \ `$\mathbb{E} \left[ Z_n \mid\mathcal{G} \right] \downarrow \mathbb{E} [X \mid \mathcal{G}] \text{a.s}$`

Observe that for every `$n$`

`$$\mathbb{E} \left[ Y_n \mid \mathcal{G} \right] \leq \mathbb{E}\left[ X_n \mid \mathcal{G} \right] \leq \mathbb{E} \left[ Z_n\mid \mathcal{G} \right]\text{ a.s.}$$`

Jensen's inequality also has a conditional version. The proof
relies again on the variational representation of convex lower semi-comntinuous
functions and on the monotonicity property of conditional expectation

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem (Conditional Jensen's inequality)

If  `$g$` is a lower semi-continuous convex function on
`$\mathbb{R}$`, with `$\mathbb{E} \left[ | g (X) | \right] < \infty$`

then

`$$g \left( \mathbb{E} \left[ X \mid \mathcal{G} \right] \right) \leq
\mathbb{E} \left[ g (X) \mid \mathcal{G} \right]  \text{a.s.}$$`

]

---

### Proof

A _lower semi-continuous_ convex function is a countable supremum of
affine functions:

there exists a countable collection `$(a_n, b_n)_n$` such that for every `$x$`,

`$$g (x) = \sup_n  \left[ a_n x + b_n \right]$$`

`$$\begin{array}{rcl}
g \left( \mathbb{E} \left[ X \mid \mathcal{G} \right] \right) & = &
\sup_n \left[ a_n \mathbb{E} \left[ X \mid \mathcal{G} \right] + b_n
\right] \\
& = & \sup_n \left[ \mathbb{E} \left[ a_n X + b_n \mid \mathcal{G}
\right] \right]\\
& \leq & \mathbb{E} \left[ \sup_n \left( a_n X + b_n \right) \mid
\mathcal{G} \right] P \text{-a.s.}\\
\end{array}$$`

???

Recall definition and characterization of lower-semi-continuous functions

---
name: condIndependance

### Independence

When the conditioning `$\sigma$` -algebra `$\mathcal{G}$` is atomic, if the conditioned
random variable `$X$` is independent from the conditioning `$\sigma$` -algebra,
it is obvious that the conditional expectation is an a.s.  constant  random variable
which value  equals `$\mathbb{E}X$`.

This remains true in the general framework

It deserves a proof

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

If  `$X \perp\!\!\!\perp \mathcal{G}$`, then

`$$\mathbb{E} \left[ X \mid \mathcal{G} \right] = \mathbb{E} \left[ X \right]\text{ a.s.}$$`

]

---

### Proof

Note  that `$\mathbb{E} \left[ X \right]$` is `$\mathcal{G}$`-measurable.
Let  `$B \in \mathcal{G}$`,

`$$\begin{array}{rl}
\mathbb{E} \left[ \mathbb{I}_B X \right]
& =  \mathbb{E} \left[ \mathbb{I}_B \right]  \mathbb{E} \left[ X \right]\\
&  \qquad \text{by  independence} \\
& =  \mathbb{E} \left[ \mathbb{I}_B \times \mathbb{E} \left[ X
\right] \right]\end{array}$$`

Hence `$\mathbb{E} \left[ X \right] = \mathbb{E} \left[ X \mid\mathcal{G} \right]$`

---

Conditional independance can be generalized to a more general setting.

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

If sub- `$\sigma$` -algebra `$\mathcal{H}$` is  independent from  `$\sigma (\mathcal{G}, \sigma (X))$` then

`$$\mathbb{E} \left[ X \mid \sigma ( \mathcal{G}, \mathcal{H}) \right] =
   \mathbb{E} \left[ X \mid \mathcal{G} \right] \hspace{1em} \text{a.s.}$$`

]

---

### Proof

Recall that conditional expectation  with respect to `$\sigma(\mathcal{G}, \mathcal{H})$` can be characterized
using a
`$\pi$`-system containing  `$\Omega$` and generating `$\sigma \left( \mathcal{G,H} \right)$`,
for  example `$\mathcal{G} \times \mathcal{H}$`. Let  `$B \in \mathcal{G}$` and  `$C \in \mathcal{H}$`,

`$$\begin{array}{rl}
\mathbb{E} \left[ \mathbb{I}_B  \mathbb{I}_C  \mathbb{E} \left[
X \mid \mathcal{G} \right] \right]
& =  \mathbb{E} \left[\mathbb{I}_B  \mathbb{E} \left[ X \mid \mathcal{G} \right] \right]
      \times \mathbb{E} \left[ \mathbb{I}_C  \right]\\
&   \qquad C \text{ is independent from }     \sigma ( \mathcal{G}, \sigma (X))\\
& =  \mathbb{E} \left[ \mathbb{I}_B X \right] \times \mathbb{E}\left[ \mathbb{I}_C  \right]\\
& =  \mathbb{E} \left[ \mathbb{I}_C  \mathbb{I}_B X \right] \\
&  \qquad C  \text{ is independent from }  \sigma ( \mathcal{G}, \sigma (X))\,  .
\end{array}$$`

---
class: inverse, center, middle
name: condProbDistrib

## Conditional probability distributions

---
name: easyregcondprob

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M201.5 174.8l55.7 55.8c3.1 3.1 3.1 8.2 0 11.3l-11.3 11.3c-3.1 3.1-8.2 3.1-11.3 0l-55.7-55.8-45.3 45.3 55.8 55.8c3.1 3.1 3.1 8.2 0 11.3l-11.3 11.3c-3.1 3.1-8.2 3.1-11.3 0L111 265.2l-26.4 26.4c-17.3 17.3-25.6 41.1-23 65.4l7.1 63.6L2.3 487c-3.1 3.1-3.1 8.2 0 11.3l11.3 11.3c3.1 3.1 8.2 3.1 11.3 0l66.3-66.3 63.6 7.1c23.9 2.6 47.9-5.4 65.4-23l181.9-181.9-135.7-135.7-64.9 65zm308.2-93.3L430.5 2.3c-3.1-3.1-8.2-3.1-11.3 0l-11.3 11.3c-3.1 3.1-3.1 8.2 0 11.3l28.3 28.3-45.3 45.3-56.6-56.6-17-17c-3.1-3.1-8.2-3.1-11.3 0l-33.9 33.9c-3.1 3.1-3.1 8.2 0 11.3l17 17L424.8 223l17 17c3.1 3.1 8.2 3.1 11.3 0l33.9-34c3.1-3.1 3.1-8.2 0-11.3l-73.5-73.5 45.3-45.3 28.3 28.3c3.1 3.1 8.2 3.1 11.3 0l11.3-11.3c3.1-3.2 3.1-8.2 0-11.4z"/></svg> Easy case: conditioning with respect to a discrete `$\sigma$` -algebra
Back to the basic setting: Given `$(\Omega,\mathcal{F},P)$`, `$\mathcal{G}\subseteq \mathcal{F}$`
denotes an _atomic_  sub- `$\sigma$` -algebra
generated by a countable partition `$(A_n)_n$` of `$\Omega$`  <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M223.99908,224a32,32,0,1,0,32.00782,32A32.06431,32.06431,0,0,0,223.99908,224Zm214.172-96c-10.877-19.5-40.50979-50.75-116.27544-41.875C300.39168,34.875,267.63386,0,223.99908,0s-76.39066,34.875-97.89653,86.125C50.3369,77.375,20.706,108.5,9.82907,128-6.54984,157.375-5.17484,201.125,34.958,256-5.17484,310.875-6.54984,354.625,9.82907,384c29.13087,52.375,101.64652,43.625,116.27348,41.875C147.60842,477.125,180.36429,512,223.99908,512s76.3926-34.875,97.89652-86.125c14.62891,1.75,87.14456,10.5,116.27544-41.875C454.55,354.625,453.175,310.875,413.04017,256,453.175,201.125,454.55,157.375,438.171,128ZM63.33886,352c-4-7.25-.125-24.75,15.00391-48.25,6.87695,6.5,14.12891,12.875,21.88087,19.125,1.625,13.75,4,27.125,6.75,40.125C82.34472,363.875,67.09081,358.625,63.33886,352Zm36.88478-162.875c-7.752,6.25-15.00392,12.625-21.88087,19.125-15.12891-23.5-19.00392-41-15.00391-48.25,3.377-6.125,16.37891-11.5,37.88478-11.5,1.75,0,3.875.375,5.75.375C104.09864,162.25,101.84864,175.625,100.22364,189.125ZM223.99908,64c9.50195,0,22.25586,13.5,33.88282,37.25-11.252,3.75-22.50391,8-33.88282,12.875-11.377-4.875-22.62892-9.125-33.88283-12.875C201.74516,77.5,214.49712,64,223.99908,64Zm0,384c-9.502,0-22.25392-13.5-33.88283-37.25,11.25391-3.75,22.50587-8,33.88283-12.875C235.378,402.75,246.62994,407,257.8819,410.75,246.25494,434.5,233.501,448,223.99908,448Zm0-112a80,80,0,1,1,80-80A80.00023,80.00023,0,0,1,223.99908,336ZM384.6593,352c-3.625,6.625-19.00392,11.875-43.63479,11,2.752-13,5.127-26.375,6.752-40.125,7.75195-6.25,15.00391-12.625,21.87891-19.125C384.7843,327.25,388.6593,344.75,384.6593,352ZM369.65538,208.25c-6.875-6.5-14.127-12.875-21.87891-19.125-1.625-13.5-3.875-26.875-6.752-40.25,1.875,0,4.002-.375,5.752-.375,21.50391,0,34.50782,5.375,37.88283,11.5C388.6593,167.25,384.7843,184.75,369.65538,208.25Z"/></svg>

Either from conditional expectations with respect to  `$\mathcal{G}$`, or  from
conditional probabilities knowing the events `$A_n,$` we can
define a `$N : \Omega \times \mathcal{F} \to [0, \infty)$`

`$$N(\omega, B) = \mathbb{E}_{P}[\mathbb{I}_B\mid \mathcal{G}](\omega) =
P\{B\mid A_n\}\text{  when } \omega \in A_n$$`

The `$N$` function has two remarkable properties:

i. For every `$\omega \in \Omega,$` `$N(\omega,\cdot)$` defines a probability on `$(\Omega,\mathcal{F}).$`

i. For every event `$B\in \mathcal{F},$` the function `$N(\cdot,B)$` is a `$\mathcal{G}$`-measurable function.

---
name: impediments

### Impediments

Now, we attempt to construct conditional probabilities
when the conditioning `$\sigma$` -algebra is not atomic <svg aria-hidden="true" role="img" viewBox="0 0 384 512" style="height:1em;width:0.75em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M80.95 472.23c-4.28 17.16 6.14 34.53 23.28 38.81 2.61.66 5.22.95 7.8.95 14.33 0 27.37-9.7 31.02-24.23l25.24-100.97-52.78-52.78-34.56 138.22zm14.89-196.12L137 117c2.19-8.42-3.14-16.95-11.92-19.06-43.88-10.52-88.35 15.07-99.32 57.17L.49 253.24c-2.19 8.42 3.14 16.95 11.92 19.06l63.56 15.25c8.79 2.1 17.68-3.02 19.87-11.44zM368 160h-16c-8.84 0-16 7.16-16 16v16h-34.75l-46.78-46.78C243.38 134.11 228.61 128 212.91 128c-27.02 0-50.47 18.3-57.03 44.52l-26.92 107.72a32.012 32.012 0 0 0 8.42 30.39L224 397.25V480c0 17.67 14.33 32 32 32s32-14.33 32-32v-82.75c0-17.09-6.66-33.16-18.75-45.25l-46.82-46.82c.15-.5.49-.89.62-1.41l19.89-79.57 22.43 22.43c6 6 14.14 9.38 22.62 9.38h48v240c0 8.84 7.16 16 16 16h16c8.84 0 16-7.16 16-16V176c.01-8.84-7.15-16-15.99-16zM240 96c26.51 0 48-21.49 48-48S266.51 0 240 0s-48 21.49-48 48 21.49 48 48 48z"/></svg>.

For each `$B \in \mathcal{F}$`, we can rely on the existence of
random variable `$\sigma (X)$`-measurable which is `$P$`-a.s. a version of
the conditional expectation of `$\mathbb{I}_B$` with respect to  `$X$`.

Indeed, for any kind of _countable_ collection of events `$(B_n)_n$` of `$\mathcal{F}$`,
we can take for granted  that there exists  a collection of random variables
which, almost surely, form a _consistent collection of  versions_
of the expectation of `$(\mathbb{I}_{B_n})_n$` with respect to  `$X$`.

If `$(B_n)_n$` is   non-decreasing tending  towards `$B$`, by the conditional monotone convergence Theorem,
we are confident in the fact  that  the following holds

`$$\lim_n \uparrow \mathbb{E} \left[ \mathbb{I}_{B_n} \mid X \right]  = \mathbb{E} \left[ \mathbb{I}_B \mid X \right] \qquad \text{a.s.}$$`

---

`$$\begin{array}{rl}
\Omega \times \mathcal{F} &
\to [0, 1] \\
(\omega, B) & \mapsto \mathbb{E} \left[ \mathbb{I}_B \mid
\sigma(X) \right](\omega) \, .
\end{array}$$`

---

The problem does not arise from the diffuse nature of the distribution of `$X$` but from
the size of `$\mathcal{F}$`

As `$\mathcal{F}$` _may not be countable_, it is possible to build an uncountable non-decreasing sequence of
events.

Checking the a.s. monotonicity of the sequence of corresponding conditional probabilities
looks beyond our reach (an uncountable union of `$P$`-negligible events is not necessarily `$P$`-negligible).

---

We first review the easy case, where we can
define conditional probabilities that even have a density
with respect to a reference measure.

Later,  we shall see that if `$\Omega$`
is not too large, we can rely on  the existence of conditional probabilities.

---
name: jointdensity
class: inverse, middle, center

## Conditional densities

---

- `$\Omega = \mathbb{R}^k$`, `$\mathcal{F} = \mathcal{B}(\mathbb{R}^k)$` and

- `$P ⊴ \text{Lebesgue}$` (has a density denoted by `$p$`),

defining conditional densities with respect to coordinate projections is (almost) as simple
as conditioning with respect to an atomic `$\sigma$` -algebra

---

We stick to the case  `$k=2$`

- A generic outcome is denoted by `$\omega = (x, y)$`
and the  coordinated projections define two random variables `$X(x, y) = x$` and `$Y (x, y) = y$`.

- We denote by `$p_X$` the _marginal density_ of the distribution of `$X$`

`$$p_X (x) = \int_{\mathcal{\mathbb{R}}} p (x, y) \mathrm{d} y$$`

- We agree on `$D =\{x : p_X (x) > 0\}$`

This is the _support of the density_  `$p_X$`

- <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M208 0c-29.9 0-54.7 20.5-61.8 48.2-.8 0-1.4-.2-2.2-.2-35.3 0-64 28.7-64 64 0 4.8.6 9.5 1.7 14C52.5 138 32 166.6 32 200c0 12.6 3.2 24.3 8.3 34.9C16.3 248.7 0 274.3 0 304c0 33.3 20.4 61.9 49.4 73.9-.9 4.6-1.4 9.3-1.4 14.1 0 39.8 32.2 72 72 72 4.1 0 8.1-.5 12-1.2 9.6 28.5 36.2 49.2 68 49.2 39.8 0 72-32.2 72-72V64c0-35.3-28.7-64-64-64zm368 304c0-29.7-16.3-55.3-40.3-69.1 5.2-10.6 8.3-22.3 8.3-34.9 0-33.4-20.5-62-49.7-74 1-4.5 1.7-9.2 1.7-14 0-35.3-28.7-64-64-64-.8 0-1.5.2-2.2.2C422.7 20.5 397.9 0 368 0c-35.3 0-64 28.6-64 64v376c0 39.8 32.2 72 72 72 31.8 0 58.4-20.7 68-49.2 3.9.7 7.9 1.2 12 1.2 39.8 0 72-32.2 72-72 0-4.8-.5-9.5-1.4-14.1 29-12 49.4-40.6 49.4-73.9z"/></svg> Check that `$p_X$` is the density of `$P \circ X^{- 1}$`

---
name: conddensity

Having a density allows us to

- calculate conditional expectation

- define what we  call a _conditional probability of `$Y$` knowing `$X$`_

.tr[
<svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M639.4 433.6c-8.4-20.4-31.8-30.1-52.2-21.6l-22.1 9.2-38.7-101.9c47.9-35 64.8-100.3 34.5-152.8L474.3 16c-8-13.9-25.1-19.7-40-13.6L320 49.8 205.7 2.4c-14.9-6.2-32-.3-40 13.6L79.1 166.5C48.9 219 65.7 284.3 113.6 319.2L74.9 421.1l-22.1-9.2c-20.4-8.5-43.7 1.2-52.2 21.6-1.7 4.1.2 8.8 4.3 10.5l162.3 67.4c4.1 1.7 8.7-.2 10.4-4.3 8.4-20.4-1.2-43.8-21.6-52.3l-22.1-9.2L173.3 342c4.4.5 8.8 1.3 13.1 1.3 51.7 0 99.4-33.1 113.4-85.3l20.2-75.4 20.2 75.4c14 52.2 61.7 85.3 113.4 85.3 4.3 0 8.7-.8 13.1-1.3L506 445.6l-22.1 9.2c-20.4 8.5-30.1 31.9-21.6 52.3 1.7 4.1 6.4 6 10.4 4.3L635.1 444c4-1.7 6-6.3 4.3-10.4zM275.9 162.1l-112.1-46.5 36.5-63.4 94.5 39.2-18.9 70.7zm88.2 0l-18.9-70.7 94.5-39.2 36.5 63.4-112.1 46.5z"/></svg>
]

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem (Conditional density)

- `$P ⊴ \text{Lebesgue}$` on `$(\mathbb{R}^2, \mathcal{B}(\mathbb{R}^2))$`
with density `$p (.,.)$`

- Let `$X, Y$` be the projection coordinates on `$\mathbb{R}^2$`

- Let `$p_X$` be the density of `$P \circ X^{-1}$`

The function `$N$` defined by

`$$N(x, y) =  \Bigg\{\begin{array}{lr} \frac{p (x, y)}{p_X (x)} & \text{if } p_X (x) > 0\\ 0 & \text{ otherwise} \end{array}\bigg.$$`

satisfies  the following properties

]

to be continued ...

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem (Conditional density, continued)

i. `$\forall x$` such that  `$p_X (x) > 0$`, the set function `$P_{\cdot \mid X=x}$`
 defined by

`$$\begin{array}{rl}
\mathcal{B}(\mathbb{R}^2) & \to [0, 1]\\
B & \mapsto P_{\cdot \mid X=x} \{B\} = \int_{\mathbb{R}} \mathbb{I}_B(x,y) N (x, y) \mathrm{d} y
\end{array}$$`

is a probability measure on `$(\mathbb{R}^2, \mathcal{B} ( \mathbb{R}^2))$`.
It is supported by `$\{x\} \times \mathbb{R}$`.

ii. `$\forall B \in \mathcal{B} ( \mathbb{R}^2)$`, the function

`$$\omega \mapsto \int_{\mathbb{R}} \mathbb{I}_B(X(\omega),y) N (X(\omega), y) \mathrm{d} y
= \mathbb{E}_{P_{\cdot \mid X=X(\omega)}} \mathbb{I}_B$$`

is a version of `$\mathbb{E}\big[\mathbb{I}_B \mid \sigma(X)\big]$`.

]

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem (Conditional density, continued)

iii. `$\forall B \in \mathcal{B} ( \mathbb{R}^2)$`

`$$P(B) = \int \left( \int \mathbb{I}_B (s,y) N (s,y) \mathrm{d} y
\right) p_X (s) \mathrm{d} s =  \int P_{\cdot \mid X=s}(B) p_X(s) \mathrm{d} s$$`

iv. For any  `$P$` -integrable function `$f$` on `$\mathbb{R}^2$`,

`$$x \mapsto \int_{\mathbb{R}} f (x, y) N (x, y) \mathrm{d} y$$`

is a version of   `$\mathbb{E}[f(X, Y)\mid \sigma(X)]$`

]

---

For each `$x$` such that `$p_X(x)>0$`, `$P_{\cdot \mid X=x}$`
is a probability on `$\mathbb{R}^2$`.

This probability measure is supported by `$\{x\} \times \mathbb{R}$`,

It is the product of `$\delta_x$` the Dirac mass in `$\{x\}$` times the probability distribution
on `$\mathbb{R}$` defined by the density `$N(x, \cdot)$`.

`$N(x, \cdot)$` is often called the _conditional density_ of `$Y$` given `$X=x$`, and the distribution over `$\mathbb{R}$` defined
by this density is often called the _conditional distribution of `$Y$` given `$X$`_

Is `$N(x,y)$`  a probability density? If yes, with respect to which `$\sigma$`-finite measure?

---

### Proof

.f6[
Proof of (i). Let us agree on notation:

`$$P_x \{B\}= \int_{\mathbb{R}} \mathbb{I}_B(x,y) N (x, y) \mathrm{d} y$$`

Immediate:

- `$P_x$` is `$[0, 1]$`-valued

- `$P_x (\{x\} \times \{\emptyset\}) = 0$`

- `$P_x (\{x\} \times \{\mathbb{R}\}) = 1$`.

- Additivity.

It remains to check that if `$(B_n)$` is a non-decreasing sequence from `$\mathcal{B}\big(\mathbb{R}^2\big)$` with  `$B_n \uparrow B$`  then

`$$\lim_n \uparrow P_x (B_n) = P_x (B)$$`

This is an immediate consequence of the _monotonous convergence theorem_, for each `$(x',y')$`

`$$\lim_n \uparrow \mathbb{I}_{B_n} (x', y') N (x', y') =  \mathbb{I}_{B} (x', y') N (x', y')$$`
]

???

The proof of Theorem consists of milking the Tonelli-Fubini Theorem

---

### Proof  (continued)

Proof of ii). As the function `$(x, y) \mapsto p (x, y) \mathbb{I}_B (x,y)$`
is `$\mathcal{B}(\mathbb{R}^2)$`-measurable and integrable, by the Tonelli-Fubini
Theorem,

`$$x \mapsto \int_B p (x, y) \mathbb{I}_B (x,y) \mathrm{d} y$$`

is defined almost everywhere and Borel-measurable

Proof of iii) This is also an immediate consequence of the
Tonelli-Fubini Theorem.

Proof of iv), It follows from  i.),
using the usual _approximation by simple functions_ argument

---

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M208 0c-29.9 0-54.7 20.5-61.8 48.2-.8 0-1.4-.2-2.2-.2-35.3 0-64 28.7-64 64 0 4.8.6 9.5 1.7 14C52.5 138 32 166.6 32 200c0 12.6 3.2 24.3 8.3 34.9C16.3 248.7 0 274.3 0 304c0 33.3 20.4 61.9 49.4 73.9-.9 4.6-1.4 9.3-1.4 14.1 0 39.8 32.2 72 72 72 4.1 0 8.1-.5 12-1.2 9.6 28.5 36.2 49.2 68 49.2 39.8 0 72-32.2 72-72V64c0-35.3-28.7-64-64-64zm368 304c0-29.7-16.3-55.3-40.3-69.1 5.2-10.6 8.3-22.3 8.3-34.9 0-33.4-20.5-62-49.7-74 1-4.5 1.7-9.2 1.7-14 0-35.3-28.7-64-64-64-.8 0-1.5.2-2.2.2C422.7 20.5 397.9 0 368 0c-35.3 0-64 28.6-64 64v376c0 39.8 32.2 72 72 72 31.8 0 58.4-20.7 68-49.2 3.9.7 7.9 1.2 12 1.2 39.8 0 72-32.2 72-72 0-4.8-.5-9.5-1.4-14.1 29-12 49.4-40.6 49.4-73.9z"/></svg>

.pull-left[

]

.pull-right[

Consider the uniform distribution on  `$\mathbb{R}^2$`
defined by `$0 \leq x \leq y  \leq 1$`

Give

- the density `$p()$`

- the marginal density `$p_X$`

- the kernel `$N (,)$`

]

---

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M208 0c-29.9 0-54.7 20.5-61.8 48.2-.8 0-1.4-.2-2.2-.2-35.3 0-64 28.7-64 64 0 4.8.6 9.5 1.7 14C52.5 138 32 166.6 32 200c0 12.6 3.2 24.3 8.3 34.9C16.3 248.7 0 274.3 0 304c0 33.3 20.4 61.9 49.4 73.9-.9 4.6-1.4 9.3-1.4 14.1 0 39.8 32.2 72 72 72 4.1 0 8.1-.5 12-1.2 9.6 28.5 36.2 49.2 68 49.2 39.8 0 72-32.2 72-72V64c0-35.3-28.7-64-64-64zm368 304c0-29.7-16.3-55.3-40.3-69.1 5.2-10.6 8.3-22.3 8.3-34.9 0-33.4-20.5-62-49.7-74 1-4.5 1.7-9.2 1.7-14 0-35.3-28.7-64-64-64-.8 0-1.5.2-2.2.2C422.7 20.5 397.9 0 368 0c-35.3 0-64 28.6-64 64v376c0 39.8 32.2 72 72 72 31.8 0 58.4-20.7 68-49.2 3.9.7 7.9 1.2 12 1.2 39.8 0 72-32.2 72-72 0-4.8-.5-9.5-1.4-14.1 29-12 49.4-40.6 49.4-73.9z"/></svg>

.pull-left[

]

.pull-right[

Consider the uniform distribution on  `$\mathbb{R}^2$`
defined by `$0 \leq x \leq y  \leq 1$`

- `$p(x,y)=2 \times \mathbb{I}_{0\leq y\leq x\leq 1}$`

- `$p_X(x)=2x \times \mathbb{I}_{0\leq x\leq 1}$`

- `$N(x, y)=\mathbb{I}_{0\leq y\leq x} \times \frac{1}{x}$`

]

---
name:  regconprob
class: middle, inverse, center

## Regular conditional probabilities, kernels

---

We will outline some results that allow us to work within a more general framework.
We introduce two new notions.

.bg-light-gray.br3.shadow-5.ph4.mt5[

### Definition Conditional probability kernel

Let  `$(\Omega, \mathcal{F})$` be a measurable  space, and `$\mathcal{G}$` a sub- `$\sigma$` -algebra of `$\mathcal{F}.$`

We call _conditional probability kernel with respect to_ `$\mathcal{G}$`
a  function  `$N : \Omega \times \mathcal{F} \rightarrow \mathbb{R}_+$` that  satisfies:

i. For any `$\omega \in \Omega$`, `$N (\omega, \cdot)$` defines a probability
on `$(\Omega, \mathcal{F})$`.

ii. For any `$A \in \mathcal{F}$`, `$N (\cdot, A)$` is
`$\mathcal{G}$`-measurable

]

---

If the measurable space is endowed with a probability distribution `$P$`,
we are interested in conditional probability kernels with respect to `$\mathcal{G}$`
that are compliant with `$P$`. We call them _regular conditional probability kernels_.

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Definition (Regular conditional probability)

Let  `$(\Omega, \mathcal{F}, P)$` be a probability space
 and `$\mathcal{G} \subseteq \mathcal{F}$` a sub- `$\sigma$` -algebra.

A kernel `$N : \Omega \times \mathcal{F} \to \mathbb{R}_+$` is a
_regular conditional probability_ with respect `$\mathcal{G}$`

iff

i. For any `$B \in \mathcal{F}$`, `$\omega \mapsto N (\omega, B)$`
is a version of the conditional expectation of `$\mathbb{I}_B$`
knowing `$\mathcal{G}$` ( `$N (\cdot, B)$` is therefore
`$\mathcal{G}$` -measurable ):

`$$N(\cdot, B) = \mathbb{E}[\mathbb{I}_B \mid \mathcal{G}]\quad P-\text{a.s.}$$`

ii. For `$P$` -almost all `$\omega \in \Omega$`,
`$B \mapsto N(\omega, B)$` defines a probability on `$(\Omega, \mathcal{F})$`.

]

---

A regular conditional probability (whenever it exists) is
defined from versions of conditional expectations. Conversely,
a regular conditional probability  provides us with  a way to
to compute conditional expectations.

---

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

- `$(\Omega, \mathcal{F}, P)$`

- `$\mathcal{G} \subseteq \mathcal{F}$`, a sub- `$\sigma$` -algebra of `$\mathcal{F}$`

- `$N$` : a probability kernel on `$(\Omega,\mathcal{F})$` w.r.t. `$\mathcal{G}$`

The following properties are equivalent

a. `$N(\cdot,\cdot)$` defines a _regular conditional probability kernel_ w.r.t. `$\mathcal{G}$`
for `$(\Omega, \mathcal{G}, P)$`

b. `$P$`-almost surely, for any `$P$`-integrable function `$f$`  on `$(\Omega, \mathcal{F})$`:

`$$\mathbb{E} \left[ f \mid \mathcal{G} \right](\omega) = \mathbb{E}_{N(\omega,\cdot)}[f]$$`

c. For any `$P$`-integrable random variable  `$X$`  on `$(\Omega, \mathcal{F})$`

`$$\mathbb{E} \left[ X \right] = \mathbb{E}\left[ \mathbb{E}_{N(\omega,\cdot)}[X]\right]$$`

]

---

The demonstration of `$1) \Rightarrow 2)$` relies on  the usual machinery:
approximation of  positive integrable functions by
an increasing sequences of simple functions,
monotone convergence  of expectation and conditional expectation.

`$2) \Rightarrow 3)$` is trivial.

`$3) \Rightarrow 1)$` is more interesting.

---

### Existence of regular conditional probability distributions when  `$\Omega =\mathbb{R}$`

We shall check  the existence of conditional probabilities in at least
one  non-trivial case.

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Theorem

Let

- `$P$` be a probability on `$( \mathbb{R}, \mathcal{B} (\mathbb{R}))$`

- `$\mathcal{G} \subseteq \mathcal{B} (\mathbb{R})$`,

then

there exists a regular conditional probability kernel with respect to `$\mathcal{G}.$`

]

---

We take advantage of  the fact that
`$\mathcal{B} (\mathbb{R})$` is countably  generated

---

### Proof

Let  `$\mathcal{C}$` be the set formed by  half-lines with rational endpoint, the empty set,  and `$\mathbb{R}$`:

`$$\mathcal{C}  = \big\{ (-\infty, q]:  q \in \mathbb{Q} \big\} \cup \{\emptyset, \mathbb{R} \}$$`

This countable collection of half-lines  is a `$\pi$` -system  that generates `$\mathcal{B}(\mathbb{R})$`.

For `$q < q' \in \mathbb{Q},$`
we can choose  versions of `$Y_q$` and `$Y_{q'}$` of
the conditional expectations of `$\mathbb{I}_{(-\infty, q]}$` and `$\mathbb{I}_{(- \infty, q']}$` such that

`$$Y_q < Y_{q'} \qquad P\text{-a.s.}$$`

Observe that `$Y_{q'} - Y_q$` is also a version of the conditional expectation
of `$\mathbb{I}_{(q, q']}$`.

---

### Proof (continued)

A countable union of `$P$`-negligible events is `$P$`-negligible,
so, as `$\mathbb{Q}^2$` is countable,
we can choose versions `$\left( Y_q \right)_{q \in\mathbb{Q}}$` of the
conditional expectations of `$\mathbb{I}_{(-\infty, q]}$` such that

`$$P\text{-a.s.} \qquad\forall q,q' \in \mathbb{Q}, \quad q < q' \Rightarrow Y_q < Y_{q'},$$`

Let `$\Omega_0$` be the  `$P$`-almost sure event on which
all `$Y_q, q \in \mathbb{Q}$` satisfy the good properties.

---

### Proof (continued)

For each `$x \in \mathbb{R}$`, we can define `$Z_x$` for
each `$\omega \in \mathbb{R}$` by

`$$Z_x (\omega) = \inf \left\{ Y_q (\omega) : q \in \mathbb{Q}, x < q \right\}$$`

On `$\Omega_0$`, the function `$x \mapsto Z_x (\omega)$`
is increasing, it has a limit on the left at each point and
it is right-continuous.

The function `$x \mapsto Z_x(\omega)$`  tends to `$0$` when `$x$` tends to `$- \infty$`, to `$1$`
when `$x$` tends towards `$+ \infty$`.

On `$\Omega_0$`, `$x\mapsto Z_x (\omega)$` is a cumulative distribution function,
`$Z_x$` defines  a unique probability measure `$\nu (\omega, .)$` on `$\mathbb{R}$`

In addition, for each `$x$`, `$Z_x$` is defined as a countable infimum of `$\mathcal{G}$`-measurable random variables, `$Z_x$` is `$\mathcal{G}$`-measurable.

---

### Proof (continued)

It remains to check  that for every `$B \in \mathcal{F}$`, `$\omega\mapsto \nu (\omega, B)$` for `$\omega \in \Omega_0$`, `$0$` elsewhere, defines
a version of the conditional expectation of `$\mathbb{I}_B$` with respect to
`$\mathcal{G}$`.

This property is satisfied for `$B \in \mathcal{C}$`.

Let us call `$\mathcal{D}$` the set of all the events for which
`$\omega \mapsto \nu (\omega, B)$` (on `$\Omega_0$`, `$0$` elsewhere)
defines a version of the conditional expectation of
`$\mathbb{I}_B$` with respect to  `$\mathcal{G}$`. We shall show that
`$\mathcal{D}$` is a `$\lambda$`-system, that is

i. `$\mathcal{D}$` contains `$\emptyset$` and `$\mathbb{R} = \Omega$`

i. If `$B, B'$` belong to `$\mathcal{D},$` and `$B \subseteq B'$`
then `$B' \setminus B \in \mathcal{D}$`

i. If `$(B_n)_n$` is a growing sequence of events from
`$\mathcal{D},$` limit `$B$` then `$B \in \mathcal{D}$`

---

### Proof (continued)

Clause i.) is guaranteed by construction.

Clause ii.) If `$B' \subseteq B$`  belong to `$\mathcal{D}$`, then
by linearity of conditional expectations, if `$\mathbb{E}\left[ \mathbb{I}_{B' \setminus B} \mid \mathcal{G} \right]$` is a
version of the conditional expectation of `$\mathbb{I}_{B' \setminus B}$` with respect to  `$\mathcal{G}$`,
on an almost-sure event `$\Omega_1 \subseteq\Omega_0$`:

`$$\begin{array}{rl}\mathbb{E} \left[ \mathbb{I}_{B' \setminus B} \mid \mathcal{G} \right]
  & = \mathbb{E} \left[ \mathbb{I}_{B'} - \mathbb{I}_B \mid \mathcal{G} \right] \\
  & = \mathbb{E} \left[ \mathbb{I}_{B'} \mid \mathcal{G} \right] - \mathbb{E} \left[ \mathbb{I}_B \mid \mathcal{G} \right] \\
  & = \nu (\omega, B') - \nu (\omega, B) \\
  & = \nu (\omega, B' \setminus B)\end{array}$$`

---

### Proof (continued)

Clause iii.). If `$(B_n)_n$` is
a non-decreasing sequence of events from `$\mathcal{D},$` with  `$B_n \uparrow B$`,
if `$\mathbb{E} \left[ \mathbb{I}_B \mid \mathcal{G} \right]$` is
a version of the conditional expectation of `$\mathbb{I}_B$` with respect to
`$\mathcal{G}$`, on an event `$\Omega_1 \subseteq \Omega_0$` with
probability `$1$`:

`$$\mathbb{E} \left[ \mathbb{I}_B \mid \mathcal{G} \right] = \lim_n\mathbb{E} \left[ \mathbb{I}_{B_n} \mid \mathcal{G} \right] =
\lim_n \nu (\omega, B_n) = \nu (\omega, B)$$`

So `$B \in \mathcal{D}$`.

The Monotone class Theorem tells us that `$\mathcal{F \subseteq \mathcal{D}}$`.

---

Working harder would allow us to show that the existence of regular
conditional probabilities is guaranteed as soon as  `$\Omega$`
can be endowed with a complete and separable metric space structure
and that the `$\sigma$` -algebra `$\mathcal{F}$`  is the Borelian `$\sigma$` -algebra induced by this metric.

---

Defining a probability distribution from a marginal distribution and a kernel

.bg-light-gray.b--light-gray.ba.bw1.br3.shadow-5.ph4.mt5[

### Proposition

Let

- `$X$`: a random variable on `$(\Omega, \mathcal{F})$`

- `$N$`: a conditional probability kernel with respect to `$\sigma(X)$`

- `$P_X$`: be a probability measure on `$(\Omega \sigma(X))$`

Then

- there exists a unique  probability measure `$P$` on
`$(\Omega, \mathcal{F})$` such that `$P_X = P \circ X^{- 1}$`

- `$N$` is a regular conditional probability kernel with respect to `$\sigma(X)$`

`$$\forall B \in \mathcal{F}, \quad P(B) = \int_{X(\Omega)} N(x, B) \mathrm{d}P_x(x)$$`

]

---

class: middle, center, inverse

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: cover

# The End