Probability III: Discrete Conditioning

name: inter-slide
class: left, middle, inverse

---
name: layout-general
layout: true
class: left, middle

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/Universite_Paris_logo_horizontal.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
template: inter-slide

# Probability III: Discrete conditioning

### 2021-09-16

#### [Probability Master I MIDS](http://stephane-v-boucheron.fr/courses/probability)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
template: inter-slide

## <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 117.66v346.32c0 11.32 11.43 19.06 21.94 14.86L160 416V32L20.12 87.95A32.006 32.006 0 0 0 0 117.66zM192 416l192 64V96L192 32v384zM554.06 33.16L416 96v384l139.88-55.95A31.996 31.996 0 0 0 576 394.34V48.02c0-11.32-11.43-19.06-21.94-14.86z"/></svg>

### [With respect to an event](#condevent)

### [With respect to an algebra](#conddiscretealgebra)

### [Conditional expectation  and prediction](#condpred)

---
template: inter-slide
name: motivation

## Motivation

???

Conditioning is central to probabilistic reasoning. In this lesson, we
investigate discrete conditioning. In this setting, the definition
of conditional probability is not an issue. The definition
of conditional expectation can be deceptively simple. Nevertheless
the discrete setting lends itself to intuitive definitions and manipulations.

The simplest notion we meet is conditional probability with respect
to a specific event with positive probability (Section \@ref(condevent)).
Conditional probability offers an intuitive interpretation of independence.

In Section \@ref(bayesformula) we state, check and discuss
Bayes formula.

In Section \@ref(conddiscretealgebra), we define conditional expectation with respect to an
atomic `$\sigma$`-algebra. This  defines conditional expectation  with respect to a discrete
random variables.

In Section \@ref(condpred), we characterize conditional expectation as
an optimal predictor. This characterization is very helpful when
defining conditional expectation in the general setting.

---
template: inter-slide
name: condevent

## Conditioning with respect to an event

---

### Definition

Let  `$P$` be a  probability distribution on `$(\Omega, \mathcal{F})$`

Let `$A \in \mathcal{F}$` be such that  `$P\{A\} > 0$`

Let  `$B$` be another event ( `$B \in \mathcal{F}$` )

The  _probability of `$B$` given  `$A$`_ is defined as:

`$$P\{B \mid A\}= \frac{P\{A \cap B\}}{P\{A\}}$$`

---

### Example

- `$X$` is  a standard Gaussian random  variable on `$(\Omega, \mathcal{F})$`, and

- event `$A$` is defined by  `$\{ \omega : X(\omega) \geq t\}$` for some `$t\geq 0$`,

we may condition on event  `$A$` and define  `$P\{B \mid A\}$` for `$B = \{ \omega : |X(\omega)|\geq 2t\}$`

We get

`$$P\{ B \mid A \} =  \frac{P\{ X \geq 2 t\}}{P\{ X \geq t\}}$$`

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Let   `$P$` be a probability distribution  on `$(\Omega, \mathcal{F})$`

Let  `$A  \in \mathcal{F}$` be such that  `$P\{A\} > 0$`

then

`$P\{\cdot \mid A\}$` ( `$P$` given `$A$` ) defines a probability distribution over  `$(\Omega, \mathcal{F})$`

]

???

We may check the proposition by considering once again the definition of probability distributions.

---

### Proof

`$P(\cdot \mid A)$` maps `$\mathcal{F}$` to `$[0,1]$`.

We have `$P(\Omega \mid A)= P(A \cap \Omega)/ P(A) = P(A) /P(A)= 1$`

Let `$(B_n)_n$` be a monotone increasing sequence of events, then

`$$\begin{array}{rl}
  P (\cup_n B_n \mid A) & = \frac{P((\cup_n B_n) \cap A)}{P(A)} \\
  & = \frac{P(\cup_n (B_n \cap A))}{P(A)} \\
  & = \frac{\lim_n P(B_n \cap A)}{P(A)} \\
  & = \lim_n P(B_n \mid A) \, .
\end{array}$$`

---

We may consider the distribution of random variables on  `$(\Omega, \mathcal{F})$` under  `$P\{\cdot \mid A\}$`.

We compute the expectation of `$X$` under  `$P\{ \cdot \mid A\}$`:

`$$\mathbb{E}_{P\{\cdot \mid A\}} X =  \frac{\mathbb{E} [\mathbb{I}_A\, X]}{P\{A\}}$$`

This is often denoted by `$\mathbb{E}[X \mid A]$`, we will try to avoid this possibly misleading  notation.

---

### Example

Assume `$X$` is standard normally distributed.

One may investigate the distribution of `$X^2$` conditionnally on event  `$A = \{ \omega : X(\omega)\geq t\}$`

For  `$t>1$`, we have

`$$\begin{array}{rl}
\mathbb{E}_{P\{\cdot \mid X \geq t\}} X^2
& = \frac{\int_t^\infty x^2 \phi(x) \mathrm{d}x}{\int_t^\infty \phi(x) \mathrm{d}x}\\
& \leq \frac{t^2}{1-1/t} + 1 \, .
\end{array}$$`

where the upper bound is obtained by repeated integration by parts.

The distribution of `$X$` given  `$A$` is not Gaussian.

Under `$A$`, `$X$` is very concentrated in the neighborhood of `$t,$` and tends to be more concentrated as `$t$` goes to infinity.

---

Knowing the probability distribution given event  `$A$`
enables to investigate  independence of events with respect to `$A$`

The next trivial proposition is worth reminding.

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

If  `$A$` and  `$B \in \mathcal{F}$` satisfy `$P\{A\}> 0$`,

then

`$A$` and `$B$` are independent under `$P$`

iff

`$$P\{B\mid A\}= P\{B\}$$`

]

---
template: inter-slide
name: bayesformula

## Bayes formula

---

Bayes formula is sometimes used in probabilistic causation theory.

This is a difficult matter.

Causality is a subtle notion and we will refrain from making causal interpretations.

---

### Proposition: Bayes formula

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Let `$P$` be a  probability distribution on `$(\Omega, \mathcal{F})$`,

Let  `$(A_i)_{i \in \mathcal{I} \subseteq \mathbb{N}}$` be a collection of pairwise disjoint events, with non-zero probability
such that  `$\cup_{i \in \mathcal{I}} A_i = \Omega$`  ( `$(A_i)_i$` form a complete system of events)

Let  `$B$` be an event with non-zero probability,

then

for all `$i \in \mathcal{I}$`

`$$P\{A_i \mid B\} = \frac{P\{A_i \} \times P\{B \mid A_i \}}{\sum_{j \in \mathcal{I}} P\{A_j \} \times P\{B \mid A_j \}}$$`

]

---

### Proof

By definition,

`$$P\{A_i \mid B\}= P\{A_i \cap B\}/ P\{B\}= P\{A_i \} \times P\{B \mid A_i \}/ P\{B\}$$`

Morever

`$$\begin{array}{rl}
P \{ B\} & = P\{B \cap (\cup_{j \in \mathcal{I}} A_j)\}\\
& = P\{\cup_{j \in \mathcal{I}} (B \cap A_j)\}  \\
& = \sum_{j \in \mathcal{I}} P\{B \cap A_j \} \\
& = \sum_{j \in \mathcal{I}} P\{A_j \} \times P\{B \mid A_j \}.
\end{array}$$`

---

In the preceding proposition,

- `$P\{A_i \}$` is called the  _prior probability_
of  `$A_i$` and

- `$P\{A_i \mid B\}$` is called  the _posterior  probability_

---
template: inter-slide
name: conddiscretealgebra

## Conditional expectation with respect to a discrete `$\sigma$`-algebra

???

While the general notion of conditional expectation requires some abstraction,
we can introduce conditioning with respect to a discrete `$\sigma$`-algebra
starting from the elementary notion of conditional probability
with respect to an event with positive probability.

---

### Definition

Let  `$\Omega$` be a universe, `$\mathcal{F}$` a
`$\sigma$`-algebra of events on
`$\Omega$`,  `$P$` a probability distribution on  `$(\Omega, \mathcal{F})$`,

Let `$(A_i)_{i \in \mathcal{I} \subseteq \mathbb{N}}$` be pairwise disjoint
events, with non-zero probability such that `$\cup_i A_i = \Omega .$`

Let  `$\mathcal{G}$` be the atomic `$\sigma$`-algebra
generated by `$(A_i)_{i \in \mathcal{I}}$`.

Let  `$X$` be a random variable from  `$(\Omega, \mathcal{F})$` to `$(\mathcal{X}, \mathcal{H})$`, the  _conditional  expectation of `$X$` with respect to `$\mathcal{G}$`_ is the random variable defined as

`$$\mathbb{E} [X \mid \mathcal{G}] = \sum_{i \in  \mathcal{I}}  \mathbb{E}_{P_{\{\cdot |A_i \}}} [X] \times \mathbf{1}_{A_i}$$`

---

While `$\mathbb{E}_{P_{\{\cdot |A_i \}}} [X]$` is a real number,
`$\mathbb{E} [X \mid \mathcal{G}]$` is a `$\mathcal{G}$`-measurable function from  `$\Omega$` to `$\mathcal{X}$`:

`$$\mathbb{E} [X \mid \mathcal{G}](\omega) =  \sum_{i \in  \mathcal{I}}  \mathbb{E}_{P_{\{\cdot |A_i \}}} [X] \times \mathbf{1}_{A_i}(\omega) \qquad \forall \omega \in \Omega$$`

These two kinds of objects should not be confused.

We will refrain from using notation `$\mathbb{E}[X \mid A_i]$` since it may be confusing:
`$\mathbb{E}[X \mid A_i]$` might denote either
- `$\mathbb{E}_{P_{\{\cdot | A_i     \}}} [X]$` or
- `$\mathbb{E} [X \mid \sigma(A_i)]$`  where  `$\sigma(A_i)$` is the sigma-algebra generated by `$A_i$`: `$\{ A_i, A_i^c, \Omega, \emptyset\}$`

---
name: prp:espercond

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Let   `$P$` be a probability distribution on  `$(\Omega, \mathcal{F})$`.

Let   `$(A_i)_{i \in \mathcal{I} \subseteq \mathbb{N}}$` be a collection of pairwise disjoint events, with non-zero probability satisfying `$\cup_{i \in \mathcal{I}} A_i = \Omega$`

Let `$\mathcal{G}=\sigma\Big((A_i)_{i \in \mathcal{I}}\Big)$` denote the sigma-algebra generated by `$(A_i)_{i \in \mathcal{I}}$`.

The random variable `$X$` is assumed to be `$P$`-integrable.

The conditional expectation `$\mathbb{E} [X \mid \mathcal{G}]$` is a `$\mathcal{G}$`-measurable random variable, that satisfies the next property:

`$$\mathbb{E} \left[ YX \right] = \mathbb{E} \left[ Y \mathbb{E} [X \mid \mathcal{G}] \right] \qquad \forall Y \in \sigma(\mathcal{G}), Y \text{ bounded.}$$`

If two `$\mathcal{G}$`-measurable random variables `$Z, T$`  satisfy

`$$\mathbb{E} \left[ YX \right] = \mathbb{E} \left[ Y Z \right] = \mathbb{E}[YT] \qquad \forall Y \in \sigma(\mathcal{G}), Y \text{ bounded}$$`

then `$Z=T$` almost surely

]

---

### Proof

We need to ckeck two points:

i. `$\mathbb{E} \left[ X \mid \mathcal{G} \right]$` satisfies Property in [Proposition](#prp:espercond)

i. if `$Z$` satisfies  Property in [Proposition](#prp:espercond), then  `$Z = \mathbb{E}\left[ X \mid \mathcal{G} \right]$` `$P$`-almost-surely.

Checking i.)

If  `$Y$` is  `$\mathcal{G}$`-measurable,

then

`$Y = \sum_{i \in \mathcal{I}} \lambda_i \mathbf{1}_{A_i}$` for some real-valued sequence  `$(\lambda_i)_{i \in \mathcal{I}}$` .

---

### Proof (continued)

Then
`$$\begin{array}{rl}
\mathbb{E} [Y \mathbb{E} \left[ X \mid \mathcal{G} \right]] & =
\mathbb{E} \left[ \left( \sum_{i \in \mathcal{I}}   \lambda_i
\mathbf{1}_{A_i} \right)    \left( \sum_{j \in \mathcal{I}}
\mathbf{1}_{A_j}  \frac{\mathbb{E} [ \mathbf{1}_{A_j} X]}{P\{A_j \}}
\right) \right]\\
& =  \left.  \mathbb{E} \left[ \sum_{i \in \mathcal{I}}
\lambda_i  \mathbf{1}_{A_i}     \frac{\mathbb{E} [
\mathbf{1}_{A_i} X]}{P\{A_i \}} \right) \right]\\
& =  \sum_{i \in \mathcal{I}}   \lambda_i  \mathbb{E} [
\mathbf{1}_{A_i} X] \frac{\mathbb{E} \left[ \mathbf{1}_{A_i}
\right]}{P\{A_i \}} \quad \text{linearity of expectation}\\
& =  \sum_{i \in \mathcal{I}}   \lambda_i  \mathbb{E} [
\mathbf{1}_{A_i} X]\\
& =  \mathbb{E} \left[ \left( \sum_{i \in \mathcal{I}} \lambda_i
\mathbf{1}_{A_i} \right) X   \right]\\
& =  \mathbb{E} \left[ YX \right] .
\end{array}$$`

---

### Proof (continued)

Checking  ii.)

Assume `$Z$` satisfies  Property in [Proposition](#prp:espercond)

Define  `$Y$` using `$Y = \mathbf{1}_{A_i}$`, for some index `$i \in \mathcal{I}$`.

As `$Z$` is `$\mathcal{G}$`-measurable, there  exists a real-valued sequence
`$\left( \mu_j \right)_{j \in \mathcal{I}}$`, such that  `$Z = \sum_{j \in \mathcal{I}} \mu_j \mathbf{1}_{A_j}$`

Thus, relying on the fact that events `$A_j$` are pairwise disjoint:

`$$\mathbb{E} \left[ ZY \right]  =  \mathbb{E} \left[ \sum_{j \in
\mathcal{I}} \mu_j  \mathbf{1}_{A_j}  \mathbf{1}_{A_i}
\right] = \mu_i P\{A_i \}$$`

According to Property in [Proposition](#prp:espercond), we have :

`$$\mathbb{E} [ZY] = \mathbb{E} [XY] = \mathbb{E} [X \mathbf{1}_{A_i}]$$`

Finally for all `$i \in \mathcal{I}$`, `$\mu_i = \mathbb{E} [X \mathbf{1}_{A_i}] / P\{A_i \}.$`

We can conclude  `$Z = \mathbb{E} [X \mid \mathcal{G}]$`.

---
template: inter-slide
name: condpred

## Conditional expectation as prediction

---

The next  proposition reveals the role of conditional expectation
in prediction/approximation problems.

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Let `$Y$` be a square-integrable random variable on  `$(\Omega, \mathcal{F}, P)$` and `$\mathcal{G}$` a discrete sub- `$\sigma$` -algebra  of `$\mathcal{F}$`

The conditional expectation of `$Y$`  with respect to `$\mathcal{G}$` minimizes

`$$\mathbb{E}\left[\big(Y - Z\big)^2\right]$$`

amongst  `$\mathcal{G}$`-measurable square-integrable random variables
]

???

This proposition is a strategic asset in the elaboration of conditional expectation with
respect to general sigma-algebras

---

Recall that a `$\mathcal{G}$`-measurable random variable is a
function that remains constant on each  `$A_i, i \in \mathcal{I}$`.

### Proof

If `$Y$` is a random variable on `$(\Omega,\mathcal{F}),$` and
if we are trying to predict at best `$Y$` from a `$\mathcal{G}$`-measurable random variable,
we are looking for a sequence of coefficients
`$(b_i)_{i\in \mathcal{I}}$` that minimizes:

`$$\begin{array}{rl}
\mathbb{E}_P \left[ \Big( Y -  \sum_{i\in \mathcal{I}} b_i \mathbf{I}_{A_i} \Big)^2 \right]
& = \mathbb{E}_P \left[ \Big(  \sum_{i\in \mathcal{I}} (Y-b_i)
\mathbf{I}_{A_i} \Big)^2 \right] \\
& =
\sum_{i\in \mathcal{I}}
\mathbb{E}_P \left[ \left( Y-b_i\right)^2
\mathbf{I}_{A_i} \right]
\\
& = \sum_{i\in \mathcal{I}} P\{A_i\}\,
\mathbb{E}_{P\{\cdot \mid A_i\}} \left[ \left( Y-b_i\right)^2  \right]
\end{array}$$`

Thus for each `$i$`, `$b_i$` must coincide with the expectation of `$Y$`
under `$P\{\cdot \mid A_i\}.$` The best prediction of `$Y$`, in the sense of the quadratic error,
among the  `$\mathcal{G}$`-measurable functions is the conditional expectation of `$Y$`
with respect to `$\mathcal{G}$`.

???

The properties identified by  propositions \@ref(prp:espercond) and \@ref(prp:espercondpred)
serve as a definition for the conditional expectation with respect to a general `$\sigma$`-algebra.

---
template: inter-slide
name: easypropcondexp

## Properties of conditional expectation

---

We state without proof a number of useful properties of conditional expectation with respect to  discrete `$\sigma$`-algebras.
We shall prove them in full generality later.

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

If `$X \leq Y$`,  `$P$`-a.s., then

$$  \mathbb{E}[X \mid \mathcal{G}] \leq \mathbb{E}[Y \mid \mathcal{G}] \qquad P \text{-p.s.}$$

]

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

`$$\mathbb{E}[ aX + bY \mid \mathcal{G}] = a \mathbb{E}[X \mid \mathcal{G}] + b \mathbb{E}[Y \mid \mathcal{G}]$$`

]

---

### Proposition

If `$\mathcal{H} \subseteq \mathcal{G} \subseteq \mathcal{F}$`

`$$\mathbb{E}\left[\mathbb{E}\left[ X \mid \mathcal{G}\right] \mid \mathcal{H} \right]  = \mathbb{E}\left[ X \mid \mathcal{H} \right]$$`

`$$\mathbb{E}\left[\mathbb{E}\left[ X \mid \mathcal{H}\right] \mid \mathcal{G} \right]  = \mathbb{E}\left[ X \mid \mathcal{H} \right]$$`

---

### Exercise <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M208 0c-29.9 0-54.7 20.5-61.8 48.2-.8 0-1.4-.2-2.2-.2-35.3 0-64 28.7-64 64 0 4.8.6 9.5 1.7 14C52.5 138 32 166.6 32 200c0 12.6 3.2 24.3 8.3 34.9C16.3 248.7 0 274.3 0 304c0 33.3 20.4 61.9 49.4 73.9-.9 4.6-1.4 9.3-1.4 14.1 0 39.8 32.2 72 72 72 4.1 0 8.1-.5 12-1.2 9.6 28.5 36.2 49.2 68 49.2 39.8 0 72-32.2 72-72V64c0-35.3-28.7-64-64-64zm368 304c0-29.7-16.3-55.3-40.3-69.1 5.2-10.6 8.3-22.3 8.3-34.9 0-33.4-20.5-62-49.7-74 1-4.5 1.7-9.2 1.7-14 0-35.3-28.7-64-64-64-.8 0-1.5.2-2.2.2C422.7 20.5 397.9 0 368 0c-35.3 0-64 28.6-64 64v376c0 39.8 32.2 72 72 72 31.8 0 58.4-20.7 68-49.2 3.9.7 7.9 1.2 12 1.2 39.8 0 72-32.2 72-72 0-4.8-.5-9.5-1.4-14.1 29-12 49.4-40.6 49.4-73.9z"/></svg>

Prove the proposition.

---

### Corollary

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

`$$\mathbb{E} X = \mathbb{E}[\mathbb{E}[X \mid \mathcal{G}]]$$`

]

---
template: inter-slide
name: gw1

## Application:  Galton-Watson processes I

---

The size of generation `$k\geq 0$` is defined recursively by

`$$Z_0 = 1, \qquad Z_{k+1} = \sum_{i=1}^{Z_k} X^k_{i} \, .$$`

The `$\sigma$`-algebra `$\sigma(Z_k)$` is discrete/atomic, it is generated by the pairwise disjoint events
`$\Big\{ Z_k = a\Big\}$` for `$a \in \mathbb{N}$`.

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

`$$\mathbb{E}\Big[Z_{k+1} \mid \sigma(Z_k)\Big] = \mathbb{E}X^0_1 \times Z_k$$`

]

---

### Proof

On the event `$\Big\{ Z_k = a\Big\}$`, we can determine the conditional distribution of `$Z_{k+1}$`.

`$$\begin{array}{rl}
\{ Z_{k+1} = b \wedge Z_k = a\Big\}
  & = \Big\{ \sum_{i=1}^a X^k_i = b \wedge Z_k = a\Big\} \\
  & = \Big\{ \sum_{i=1}^a X^k_i  = b\Big\} \cap \Big\{ Z_k = a \Big\}
\end{array}$$`

As `$\sum_{i=1}^a X^k_i$` and `$Z_k$` are independent random variables, we have

`$$\begin{array}{rl}
P \Big\{ Z_{k+1} = b \mid Z_k = a\Big\} = P \Big\{ \sum_{i=1}^a X^k_i  = b \mid Z_k = a\Big\} = P \Big\{ \sum_{i=1}^a X^k_i  = b \Big\}
\end{array}$$`

---

### Proof (continued)

Note that the right-hand-side has nothing to do with `$Z_k$`.
On the event `$\{Z_k = a\}$`, `$Z_{k+1}$` is distributed like the sum of `$a$` independent copies of `$X^0_1$`:

`$$\begin{array}{rl}
\mathbb{E} \Big[ Z_{k+1}\mid \sigma(Z_k)\Big]
  & = \sum_{a=0}^\infty \mathbb{E}_{P(\mid Z_k=a)}\Big[Z_{k+1}\Big] \times \mathbb{I}_{Z_k=a} \\
  & = \sum_{a=0}^\infty \mathbb{E}_{P(\mid Z_k=a)}\Big[ \sum_{i=1}^a X^k_i \Big] \times \mathbb{I}_{Z_k=a} \\
  & = \sum_{a=0}^\infty \mathbb{E}\Big[ \sum_{i=1}^a X^k_i \Big] \times \mathbb{I}_{Z_k=a} \\
  & = \sum_{a=0}^\infty \sum_{i=1}^a  \mathbb{E}\Big[ X^k_i \Big] \times \mathbb{I}_{Z_k=a} \\
  & = \sum_{a=0}^\infty a  \mathbb{E} X^0_1 \times \mathbb{I}_{Z_k=a}  \\
  & = \mathbb{E} X^0_1 \times Z_k \, .
\end{array}$$`

---

An immediate corollary is:

`$$\mathbb{E}Z_k = (\mathbb{E} X^0_1)^k  \qquad\text{forall } k\geq 0\,.$$`

The sequence of expected sizes of generations forms a geometric sequence.

A Galton-Watson process is said to be _sub-critical_
if the expectation of the offspring distribution is smaller than `$1$`.

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

The extinction probability of a sub-critical branching process is equal to `$1$`.

]

---

### Proof

Denote by `$E_k$` the event `$\{ Z_k = 0\}$`.

Observe
that the sequence `$(E_k)_k$` is increasing.

Denote by  `$E_{\infty} = \cup_{k=0}^\infty E_k$`.

`$$P \{ E_k^c \} = P\{ Z_k \geq 1 \} \leq \mathbb{E} Z_k$$`

Hence `$P\{E_k^c\} \downarrow 0$` and `$P\{ E_k\} \uparrow 1$`

By monotone convergence `$P(E_\infty) = 1$`.

---

The expected size of the total progeny of subcritical branching process is equal to

`$$\sum_{k=0}^\infty \mathbb{E} Z_k  = \sum_{k=0}^\infty (\mathbb{E} X^0_1)^k = \frac{1}{1 - \mathbb{E} X^0_1}$$`

---

### Remark

---

class: middle, center, inverse

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: 112%

# The End