Probability IV: A Modicum of Measure Theory

name: inter-slide
class: left, middle, inverse

---
name: layout-general
layout: true
class: left, middle

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/Universite_Paris_logo_horizontal.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
template: inter-slide

# Probability IV: A modicum of Measure Theory

### 2021-09-24

#### [Probability Master I MIDS](http://stephane-v-boucheron.fr/courses/probability)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
class: left, middle, inverse

.pull-left[

### [Motivation](#motivation)

### [Universe, powerset and `$\sigma$`-algebras](#tribe)

### [Probability distributions](#distributions)

### [Measurable functions](#measurablefunctions)

]

.pull-right[

### [Lebesgue Measure](#lebesque)

### [Monotone class theorem](#monotoneclass)

### [Distributions on real line](#distributionsrealline)

### [General random variables](#generalrv)

]

---
template: inter-slide
name: motivation

## Motivation

---

# A modicum of measure theory

## Roadmap

.f6[

Performing stochastic modeling in a comfortable way requires
consistent foundations and notation.

In this chapter, we set the stage for further development.

Probability theory started as the interaction between combinatorics and games of chance (XVIIth century).

At that time, the set of outcomes was finite, and it was legitimate to think that
any set of outcomes had a well-defined probability.

When mathematicians started to perform stochastic modeling in different branches of
sciences (astronomy, thermodynamics, genetics, ...), they had to handle uncountable sets of outcomes.

Designing a sound definition of what a probability distribution is, took time.

Progress in _integration_ and _measure theory_ during the XIXth century and
the early decades of the XXth century led to the modern, measure-theoretical
foundation of probability theory.
]

---
template: inter-slide
name: tribe

## Universe, powerset and `$\sigma$`-algebras

---

### Definition: universe

A _universe_ is a set (of possible _outcomes_) we decide to call a universe. The universe is often denoted by `$\Omega$`. Generic elements of `$\Omega$`  (outcomes) are denoted by `$\omega$`.

### Example <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M592 192H473.26c12.69 29.59 7.12 65.2-17 89.32L320 417.58V464c0 26.51 21.49 48 48 48h224c26.51 0 48-21.49 48-48V240c0-26.51-21.49-48-48-48zM480 376c-13.25 0-24-10.75-24-24 0-13.26 10.75-24 24-24s24 10.74 24 24c0 13.25-10.75 24-24 24zm-46.37-186.7L258.7 14.37c-19.16-19.16-50.23-19.16-69.39 0L14.37 189.3c-19.16 19.16-19.16 50.23 0 69.39L189.3 433.63c19.16 19.16 50.23 19.16 69.39 0L433.63 258.7c19.16-19.17 19.16-50.24 0-69.4zM96 248c-13.25 0-24-10.75-24-24 0-13.26 10.75-24 24-24s24 10.74 24 24c0 13.25-10.75 24-24 24zm128 128c-13.25 0-24-10.75-24-24 0-13.26 10.75-24 24-24s24 10.74 24 24c0 13.25-10.75 24-24 24zm0-128c-13.25 0-24-10.75-24-24 0-13.26 10.75-24 24-24s24 10.74 24 24c0 13.25-10.75 24-24 24zm0-128c-13.25 0-24-10.75-24-24 0-13.26 10.75-24 24-24s24 10.74 24 24c0 13.25-10.75 24-24 24zm128 128c-13.25 0-24-10.75-24-24 0-13.26 10.75-24 24-24s24 10.74 24 24c0 13.25-10.75 24-24 24z"/></svg>

If we think of throwing a dice as a random phenomenon, the set of outcomes is the set of labels on the faces `$\Omega = \{1, 2, 3, 4, 5, 6\}$`.  If we are throwing two dices, the set of outcomes is made of
couples of labels `$\Omega' = \{(1,1), (1, 2), (1, 3), \ldots, (6,6)\} =\Omega^2$`.

---

### Example:  universal hashing

In the idealized hashing problem, the universe is  the set of functions from `$1, \ldots, n$` to `$1, \ldots, m$`. The size of the universe is `$m^n$`.

---

A universe may or may not be finite or countable. If the universe is countable, all its subsets may be  called _events_.

Events are assigned probabilities.

If the universe is countable, it is possible to assign a probability  to each of its subsets

When the universe  is not countable (for example `$\mathbb{R}$`), assigning a probability
to all subsets is not possible without running into paradoxes

We have to  restrict  the  collection of subsets in order to assign probabilities to the collection members in a consistent way

---

### `$\sigma$`-algebras

In the sequel `$2^{\Omega}$` denotes the collection of all subsets of `$\Omega$` (the powerset of `$\Omega$`).

A sensible collection of events has to be a `$\sigma$`-algebra

### Definition

Given a set `$\Omega$`, a collection `$\mathcal{G}$` of subsets of `$\Omega$` ( `$\mathcal{G} \subseteq 2^{\Omega}$`  ) is called a  `$\sigma$`-algebra (a sigma  algebra)

iff

- `$\mathcal{G}$` is closed under `$countable$` union
- `$\emptyset \in \mathcal{G}$`
- `$\mathcal{G}$` is closed under complementation ( `$A \in \mathcal{G} \Rightarrow A^c = \Omega \setminus A \in \mathcal{G}$` )

---

The next proposition shows that `$\sigma$`-algebras are stable under countable set-theoretical operations.

This is consequence of

### De Morgan's laws:

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

`$$(A \cup B)^c  = A^c \cap B^c \qquad \text{and} \qquad  (A \cap B)^c  = A^c \cup B^c$$`

]

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

A `$\sigma$`-algebra of subsets is closed under countable intersections.

]

---

### Proof

For `$A \subseteq \Omega$`, let `$A^c = \Omega \setminus A$`.
Let `$A_1, \ldots, A_n, \ldots$` belong to `$\sigma$`-algebra `$\mathcal{G}$` of subsets
of `$\Omega$`. For each `$n$`, `$A_n^c \in \mathcal{G}$`, by definition of `$\sigma$`-algebra,
`\begin{array}
\cap_n A_n
 & = \Big(\big(\cap_n A_n\big)^c\Big)^c \\
 & = \Big(\cup_n A_n^c\Big)^c  \qquad \text{De Morgan} \, .
\end{array}`
By definition of a `$\sigma$`-algebra, `$\cup_n A_n^c \in \mathcal{G}$`, and
for the same reason, `$\Big(\cup_n A_n^c \Big)^c \in \mathcal{G}$`.

---

The next proposition allows us to talk about the _smallest_ `$\sigma$`-algebra
containing a collection of subsets

This leads to the notion of _generated_ `$\sigma$`-algebra

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

The intersection of two `$\sigma$`-algebras of subsets of `$\Omega$`
is a `$\sigma$`-algebra of subsets of `$\Omega$`.

]

---

### Proof

Let `$\mathcal{G}$` and `$\mathcal{G}'$` be two `$\sigma$`-algebras of subsets
of `$\Omega$`.

The intersection of the two `$\sigma$`-algebras is

`$$\Big\{ A : A\subseteq \Omega, A \in \mathcal{G}, A \in \mathcal{G}' \Big\}$$`

---

Because of this property, the notion of a `$\sigma$`-algebra generated by a collection of subsets is well-founded.

---

### Lemma

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Given a collection `$\mathcal{C}$` of subsets of `$\Omega$`, there exists a _unique smallest `$\sigma$`-algebra_ containing all subsets
in `$\mathcal{C}$`, it is called the `$\sigma$`-algebra generated by `$\mathcal{H}$` and denoted by `$\sigma(\mathcal{C})$`.

]

---

### Example

Consider we are throwing a dice, `$\Omega = \{1, \ldots, 6\}$`, let

`$$\mathcal{H} = \Big\{\{1, 3, 5\}\Big\}\, .$$`

This is a collection made of one event (the outcome  is odd).

The algebra generated by `$\mathcal{H}$` is

`$$\sigma(\mathcal{H}) = \Big\{ \{1, 3, 5\}, \{2, 4, 6\}, \emptyset, \Omega \Big\} \, .$$`

---

Two kinds of `$\sigma$`-algebras play a prominent role in a basic probability course:

1. the powerset of countable or finite sets.

2. the Borel `$\sigma$`-algebras of  topological spaces.

---

### Definition: Borel sigma-algebra

The Borel `$\sigma$`-algebra over `$\mathbb{R}$` is the `$\sigma$`-algebra generated
by _open_ sets.

When `$\mathbb{R}$` (resp. `$\mathbb{R}^k$`) is endowed with its usual topology, the Borel `$\sigma$`-algebra is denoted by `$\mathcal{B}(\mathbb{R})$` (resp. `$\mathcal{B}(\mathbb{R}^k)$`).

---

This collection
is defined by the following list of properties:

1. `$\emptyset, E \in \mathcal{E}$`
1. A (_possibly uncountable_) union of elements of `$\mathcal{E}$` (open sets)
belongs to `$\mathcal{E}$` (is an open set)
1. A finite intersection of open sets is an open set.

---

In the usual topology on `$\mathbb{R}$`, a set `$A$` is open if for any `$x \in A$`, there
exists some `$r>0$` such that `$]x-r, x+r[ \subseteq A$`. Any interval of the form `$]a,b[$`
is open (these are the so-called open intervals).

This topology can be generalized to any finite dimension `$\mathbb{R}^d$`.

---

---

We are now ready to set the stage of stochastic modeling. 
The playground always consists of a measurable space.

### Definition Measurable space

A universe `$\Omega$` endowed with a `$\sigma$`-algebra of subsets `$\mathcal{F}$`
is called a measurable space. It is denoted by `$(\Omega, \mathcal{F})$`.

### Example

- If `$\Omega$` is a countable or finite set, then `$(\Omega, 2^\Omega)$` is
a measurable space.

- If `$\Omega=\mathbb{R}$`, then `$(\mathbb{R}, \mathcal{B}({\mathbb{R}}))$` is a measurable space.

---

So far, we have not talked about probability theory, but,
we are now equipped to define probability distributions and to manipulate them

---
template: inter-slide
name: distributions

## Probability distributions

---

A _probability distribution_ maps a `$\sigma$`-algebra to `$[0,1]$`. It is
an instance of a more general concept called a _measure_. 
We state or recall
important concept of measure theory.

Attempts to measure all subsets of `$\mathbb{R}$` lead to paradoxes of little
practical use.

Measure theory starts by recognizing the desirable properties any useful measure should possess, then measure theory builds objects satisfying these properties on as large as possible
`$\sigma$`-algebras of events, for example on Borel `$\sigma$`-algebras.

This motivates the definition of `$\sigma$`-additivity.

---

### Definition: `$\sigma$`-additivity

Given  `$\Omega$` and `$\mathcal{A} \subseteq 2^\Omega$`, a function `$\mu$` mapping
`$\mathcal{A}$` to `$[0,\infty)$` is said to be `$\sigma$`-_additive_  on `$\mathcal{A}$`

for any countable collection of _pairwise disjoint_ subsets `$(A_n)_{n \in \mathbb{N}} \in \mathcal{A}$`,
with `$\cup_n A_n \in \mathcal{A}$`

we have

`$$\mu (\cup_{n \in \mathbb{N}} A_n) = \sum_{n \in \mathbb{N}} \mu(A_n)$$`

---

Note that if `$\mathcal{F}$` is a `$\sigma$`-algebra, `$\Big(\cup_{n \in \mathbb{N}} A_n\Big) \in \mathcal{F}$`

`$\sigma$`-additivity fits well  with `$\sigma$`-algebras, but it makes sense to
define `$\sigma$`-additivity with respect to more general collections of subsets

### Definition: Positive measure

Given a measurable space `$(\Omega, \mathcal{F})$`, a `$\sigma$`-additive function `$\mu$` mapping
`$\mathcal{F}$` to `$[0,\infty)$` and satisfying `$\mu(\emptyset)=0$`
is called a  _positive measure_ over `$(\Omega, \mathcal{F})$`.

The tuple `$(\Omega, \mathcal{F}, \mu)$` is called a measure space

---

Let `$\Omega = \{0,1\}^*$` the set of infinite sequences of `$0$` and `$1$` (indexed
  from `$1$`).

Let `$\mathcal{F}_n \subseteq 2^\Omega$` be the `$\sigma$`-algebra generated by
events of the following form: `$\{ \omega : \omega \in \Omega, \omega_i = 1\}$`
for `$1\leq i\leq n$`.

- Define a `$\sigma$`-additive function on `$(\Omega, \mathcal{F}_n)$`.
- <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M127.1 146.5c1.3 7.7 8 13.5 16 13.5h16.5c9.8 0 17.6-8.5 16.3-18-3.8-28.2-16.4-54.2-36.6-74.7-14.4-14.7-23.6-33.3-26.4-53.5C111.8 5.9 105 0 96.8 0H80.4C70.6 0 63 8.5 64.1 18c3.9 31.9 18 61.3 40.6 84.4 12 12.2 19.7 27.5 22.4 44.1zm112 0c1.3 7.7 8 13.5 16 13.5h16.5c9.8 0 17.6-8.5 16.3-18-3.8-28.2-16.4-54.2-36.6-74.7-14.4-14.7-23.6-33.3-26.4-53.5C223.8 5.9 217 0 208.8 0h-16.4c-9.8 0-17.5 8.5-16.3 18 3.9 31.9 18 61.3 40.6 84.4 12 12.2 19.7 27.5 22.4 44.1zM400 192H32c-17.7 0-32 14.3-32 32v192c0 53 43 96 96 96h192c53 0 96-43 96-96h16c61.8 0 112-50.2 112-112s-50.2-112-112-112zm0 160h-16v-96h16c26.5 0 48 21.5 48 48s-21.5 48-48 48z"/></svg> What is the `$\sigma$`-algebra generated by `$\cup_{n\geq 1} \mathcal{F}_n$`?
- <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M127.1 146.5c1.3 7.7 8 13.5 16 13.5h16.5c9.8 0 17.6-8.5 16.3-18-3.8-28.2-16.4-54.2-36.6-74.7-14.4-14.7-23.6-33.3-26.4-53.5C111.8 5.9 105 0 96.8 0H80.4C70.6 0 63 8.5 64.1 18c3.9 31.9 18 61.3 40.6 84.4 12 12.2 19.7 27.5 22.4 44.1zm112 0c1.3 7.7 8 13.5 16 13.5h16.5c9.8 0 17.6-8.5 16.3-18-3.8-28.2-16.4-54.2-36.6-74.7-14.4-14.7-23.6-33.3-26.4-53.5C223.8 5.9 217 0 208.8 0h-16.4c-9.8 0-17.5 8.5-16.3 18 3.9 31.9 18 61.3 40.6 84.4 12 12.2 19.7 27.5 22.4 44.1zM400 192H32c-17.7 0-32 14.3-32 32v192c0 53 43 96 96 96h192c53 0 96-43 96-96h16c61.8 0 112-50.2 112-112s-50.2-112-112-112zm0 160h-16v-96h16c26.5 0 48 21.5 48 48s-21.5 48-48 48z"/></svg> <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M127.1 146.5c1.3 7.7 8 13.5 16 13.5h16.5c9.8 0 17.6-8.5 16.3-18-3.8-28.2-16.4-54.2-36.6-74.7-14.4-14.7-23.6-33.3-26.4-53.5C111.8 5.9 105 0 96.8 0H80.4C70.6 0 63 8.5 64.1 18c3.9 31.9 18 61.3 40.6 84.4 12 12.2 19.7 27.5 22.4 44.1zm112 0c1.3 7.7 8 13.5 16 13.5h16.5c9.8 0 17.6-8.5 16.3-18-3.8-28.2-16.4-54.2-36.6-74.7-14.4-14.7-23.6-33.3-26.4-53.5C223.8 5.9 217 0 208.8 0h-16.4c-9.8 0-17.5 8.5-16.3 18 3.9 31.9 18 61.3 40.6 84.4 12 12.2 19.7 27.5 22.4 44.1zM400 192H32c-17.7 0-32 14.3-32 32v192c0 53 43 96 96 96h192c53 0 96-43 96-96h16c61.8 0 112-50.2 112-112s-50.2-112-112-112zm0 160h-16v-96h16c26.5 0 48 21.5 48 48s-21.5 48-48 48z"/></svg> Can you define a `$\sigma$`-additive function on `$(\Omega, \sigma(\cup_{n\geq 1} \mathcal{F}_n))$`?

---

For example, the _counting measure_ `$\mu$` on `$\mathbb{N}$`  satisfies

`$$\mu(A) = |A|$$`

for all `$A \subseteq \mathbb{N}$`,

so we have `$\mu(\mathbb{N})=\infty$`

---

### Definition: Probability distribution

Given a measurable space `$(\Omega, \mathcal{F})$`, a function `$\mu$` mapping
`$\mathcal{F}$` to `$[0,\infty)$` is a probability distribution over `$(\Omega, \mathcal{F})$`

1. `$\mu$` is a positive measure on `$(\Omega, \mathcal{F})$`  and

2. `$\mu(\Omega)=1$`.

---

If you think `$(\mathbb{R}, 2^{\mathbb{R}})$` is a measurable space, define a `$\sigma$`-additive measure on it

Try even to define a probability measure

---

The notion of `$\sigma$`-additivity is stronger than finite additivity

Assuming the _Axiom of Choice_ (as usual when working in Analysis or Probability), there exists a function `$\mu$`
that map  `$2^{\mathbb{N}}$` to `$[0,1]$`, that is  additive ( `$\mu(A \cup B)= \mu(A) + \mu(B)$` for all `$A, B, A\cap B=\emptyset$` ),
zero on all finite subsets of `$\mathbb{N}$` and such that `$\mu(\mathbb{N})=1$`.

Such a function is not `$\sigma$`-additive.

---
template: inter-slide
name: lebesque

## Lebesgue measure

---

We take the existence of the Lebesgue measure for granted. This is the content of the next theorem.

### Theorem:  Existence of the Lebesgue measure

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

There exists a unique `$\sigma$`-additive measure `$\ell$`
on `$(\mathbb{R}, \mathcal{B}(\mathbb{R}))$` such that

`$$\ell((a, b)) = b - a$$`

for all finite `$a < b$`.

]

Lebesgue measure existence Theorem is typical of statements of measure theory

It defines a complex object (a measure) by its trace on a simple collection of sets (intervals)

---

The proof of Lebesgue measure existence Theorem  can be cut in several meaningful pieces

- First define a length function on intervals

- Show that this function can be extended to an additive function on finite unions, finite intersections and complements of intervals.

- Then check that the extension is in fact `$\sigma$`-additive on the closure
of  intervals under finite set-theoretical operations (which is not a `$\sigma$`-algebra).

Once this additive extension is constructed, use Carathéodory s extension theorem below
to prove that the length function can be extended to a `$\sigma$`-additive function
on the `$\sigma$`-algebra generated by intervals (the Borel `$\sigma$`-algebra).

Then it remains to check that the extension is unique. This can be done by a generating
set argument, for example the _monotone class Lemma_.

---

### Carathéodory  extension theorem

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Let `$\mathcal{A} \subseteq 2^{\Omega}$`.

Assume

- `$\mathcal{A}$` contains `$\emptyset, \Omega$`, and
- `$\mathcal{A}$` is closed  under finite unions, and complementation.

Assume `$\rho : \mathcal{A} \to [0, \infty]$`
is `$\sigma$`-additive on `$\mathcal{A}$`.

Then

there exists a measure `$\mu$`  on `$\sigma(\mathcal{A})$` such that `$\mu(A)=\rho(A)$`
for all `$A \in \mathcal{A}$`.

]

---

If we denote Lebesgue measure by `$\ell$`, the uniform probability distribution
over `$[a,b]$` assign probability

`$$P(A) = \frac{\ell(A)}{b-a} = \frac{\ell(A)}{\ell([a,b])}$$`

to any `$A \in \mathcal{B}(\mathbb{R}) \cap [a,b]$`.

---

The uniform distribution over `$([0,1], \mathcal{B}([0,1]))$` looks like
an academic curiosity with no practical utility

This superficial opinion should be dispelled

This can be done using a device called the _quantile transform_

In this sense, the uniform distribution is the mother of all distributions

---

How does a typical outcome look like?

A real number `$\omega \in [0,1]$` has binary expansions:
`$\omega = \sum_{i=1}^\infty b_i 2^{-1}$`
with `$b_i \in \{0,1\}$`.

What is the probability there is a unique binary expansion?

First, check whether this probability is well-defined.

Assuming  the binary expansion is unique, `$\omega$` is said to be _normal_

`$\lim_n \frac{1}{n}\sum_{i=1}^n b_i(\omega) = 1/2$`.

Is the probability of obtaining a normal number well-defined?

If yes, compute it.

---

Check that `$\mathcal{B}(\mathbb{R}) \cap [a,b] =  \Big\{ A \cap [a,b] : A \in \mathcal{B}(\mathbb{R}) \Big\}$`
is the `$\sigma$`-algebra generated  by the trace of the usual topology if `$\mathbb{R}$` on `$[a,b]$`

---

The Lebesgue existence theorem can be extended. Indeed, any sensible
definition of the length of an interval can serve as a starting point

Recall that a real function is CADLAG if it is right-continuous everywhere,
and has left-limits everywhere

The next Theorem can be established in a way that parallels the construction
of Lebesgue's measure

---

### Theorem

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Any non-decreasing CADLAG function `$F$` on `$\mathbb{R}$` defines a `$\sigma$`-additive
measure `$\mu$` on `$(\mathbb{R}, \mathcal{B}(\mathbb{R}))$` that satisfies:

`$$\mu((a, b)) = F(b) - F(a)$$`

where `$F(x_-)= \lim_{t \uparrow x} F(t)$`

]

---

We recover Lebesgue measure existence Theorem by taking `$F(x)=x$`.

If we focus on functions `$F$` that satisfy `$\lim_{x \to -\infty} F(x)=0$`
and `$\lim_{x \to \infty} F(x)=1$`, the Theorem  defines
probability distributions through their _cumulative distribution functions_

???

(more on this topic in Section \@ref(distribfun)).

---
template: inter-slide
name: measurablefunctions

## Measurable functions and random variables

???

{#measurablefunctionsrv}

---

So far, we only talked about probability and measure of sets (events)

As stochastic modeling is at the root of quantitative analysis, we introduce
the notion of _measurable function_

This allows us handle numerical functions that map outcomes to `$\mathbb{R}$`
or `$\mathbb{R}^d$`

Not every numerical function is  measurable

To define what we call a measurable function, we need the notion of _inverse image_  or _preimage_

---

### Definition

Let `$f$` be a function from `$\mathcal{X}$` to `$\mathcal{Y}$`, we denote by `$f^{-1}$`
the function that maps `$2^{\mathcal{Y}}$` to `$2^{\mathcal{X}}$` defined by

`$$\begin{array}{rl}
f^{-1}: \qquad 2^{\mathcal{Y}} & \rightarrow  2^{\mathcal{X}} \\
B & \mapsto f^{-1}(B) = \Big\{ x : x \in \mathcal{X}, f(x) \in B \Big\} \, .
\end{array}$$`

The set `$f^{-1}(B)$` is called the _preimage_ or _inverse image_ of `$B$` under `$f$`.

---

Note that
`$f^{-1}$` does not denote the inverse of function `$f$` which may not be injective

In this course,  `$f^{-1}$` is a set function from the powerset of
the codomain of `$f$` to the powerset of the domain of `$f$`

The inverse function if it exists (or the generalized inverse function) is denoted by `$f^\leftarrow$`

The inverse function, when it exists, maps `$f(\mathcal{X})\subseteq \mathcal{Y}$` to
`$\mathcal{X}$`

---

### Example

Recall the idealized hashing setting

Let `$\Omega$` denote the set of functions from `$1, \ldots, n$`  to `$1, \ldots, m$`
(assume `$n \leq m$`). For `$\omega \in \Omega$` ( `$\omega$` is function, but it
is also a `$1, \ldots, m$`-valued sequence of length `$n$` ), let `$f(\omega)$` be the
number of values in `$1, \ldots, m$` that have no occurrence in `$\omega$`
(the number of empty bins in the allocation defined by `$\omega$`).

The function
`$f$` is a numerical function that maps `$\Omega=\{1, \ldots, m\}^n$` .

For `$B \in \mathbb{N}$`, `$f^{-1}(B)$` is the subset of allocations
which have `$k$` empty bins, `$k \in B$`.

---

The preimage operation works well with set-theoretical operations

Elementary properties of measurable functions follow from properties of inverse images

Inverse image preserves set-theoretical operations.

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Let `$f: E \mapsto F$`, then for `$A, B, A_1, \ldots,A_n , \ldots   \subseteq F$`,

`$$\begin{array}{lr}
f^{-1}(A \cup B) & = f^{-1}(A) \cup f^{-1}(B) \\
f^{-1}(A \cap B) & = f^{-1}(A) \cap f^{-1}(B) \\
f^{-1}(\cup_{n \in \mathbb{N}}A_n ) & = \cup_{n \in \mathbb{N}} f^{-1}(A_n) \\
f^{-1}(\cap_{n \in \mathbb{N}}A_n ) & = \cap_{n \in \mathbb{N}} f^{-1}(A_n) \\
f^{-1}(F \setminus A) & =  f^{-1}(F) \setminus f^{-1}(A)
\end{array}$$`

]

---

### Corollary

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Taking the preimages of elements of a `$\sigma$`-algebra defines a
`$\sigma$`-algebra

]

Let `$f$` map `$\Omega$` to `$\Omega'$`, prove that
`$$\mathcal{H} = \Big\{ f^{-1}(B) : B \in \mathcal{G}\Big\}$$`
is a `$\sigma$`-algebra of subsets of `$f^{-1}(\Omega')$`

---

### Definition: Measurable functions

Let `$(\Omega, \mathcal{F})$` and `$(\Omega', \mathcal{G})$` be two measurable spaces.

A function `$f: \Omega \to \Omega'$` is said to be `$\mathcal{F}/\mathcal{G}$`-measurable

iff

for `$B \in \mathcal{G}$`, `$f^{-1}(B) \in \mathcal{F}$`.

---

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

The pointwise limit of measurable functions is a measurable function:

- `$(f_n)_n$` is a sequence of measurable functions from `$(\Omega, \mathcal{F})$`
to `$(\mathcal{X}, \mathcal{G})$`, and 
- `$f_n \to f$` pointwise,

then

`$f$` is a measurable function

]

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

The sum of measurable functions is a measurable function:

if `$f, g$` are  measurable functions from `$(\Omega, \mathcal{F})$`
to `$(\mathbb{R}, \mathcal{B}(\mathbb{R}))$`,

then

`$a f + b g$` is a measurable function for all `$a,b \in \mathbb{R}$`.

]

---

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

The composition of measurable functions is a measurable function:

- `$f$` is a  measurable function from `$(\Omega, \mathcal{F})$` to `$(\mathcal{X}, \mathcal{G})$`,
and 
- `$g$` is a  measurable function from `$(\mathcal{X}, \mathcal{G})$` to `$(\mathcal{Y}, \mathcal{H})$`,

then

`$g \circ f$` ( `$g \circ f(\omega) = g(f(\omega))$` for all `$\omega$` )
is a measurable function from `$(\Omega, \mathcal{F})$` to `$(\mathcal{Y}, \mathcal{H})$`.

]

---
template: inter-slide
name: monotoneclass

## The Monotone class theorem

---

The monotone class theorem (or lemma) is a powerful example of the generating class arguments
that can be used to prove that two probability measures or, maybe, two `$\sigma$`-finite measures are equal

---

### Definition: `$\pi$`-class

A collection `$\mathcal{G}$` of subsets of `$\Omega$` is said to be a
`$\pi$`-class if:

- `$\Omega \in \mathcal{G}$`

- it is stable/closed by finite intersection
`$$A, B \in \mathcal{G} \Rightarrow A \cap B \in \mathcal{G}$$`

---

### Definition: Monotone class

A collection `$\mathcal{M}$` of subsets of `$\Omega$` is said to be
a _monotone class_ or a  `$\lambda$`-system

it satisfies the following properties:

1. `$\Omega \in \mathcal{M}$`

2. If `$A, B \in \mathcal{M}$`, and `$A \subseteq B$` then `$B \setminus A \in \mathcal{M}$`

3. If `$A_n \in \mathcal{M}$` and `$A_n \subseteq A_{n+1}$` for every `$n \in \mathbb{N}$`
then `$\lim_n A_n = \cup_{n \in \mathbb{N}} A_n \in \mathcal{M}$`.

---

It makes sense to talk about the smallest `$\lambda$`-system containing a collection of sets

---
name: usefulprop

The next easy proposition makes `$\lambda$`-system very useful when we want to check that two probability distributions are equal.

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

The class of sets over which two probability distributions coincide is a `$\lambda$`-system.

]

---

### Proof

Let `$(\Omega, \mathcal{F})$` be a measurable space. Let `$P, Q$` be two probability distributions over `$(\Omega, \mathcal{F})$`. Let `$\mathcal{C} \subseteq \mathcal{F}$` be defined by

`$$\mathcal{C} = \Big\{ A : A \in \mathcal{F},  P(A)=Q(A) \Big\}$$`

By the very definition of measures we have `$P(\Omega)=Q(\Omega)$`, hence `$\Omega \in \mathcal{C}$`.

If `$A \subseteq B$` both belong to `$\mathcal{C}$`, again by the very definition of measures,

`$$P (B \setminus A) = P(B) - P(A) = Q(B) - Q(A) = Q(B \setminus A)$$`

hence, `$B\setminus A \subseteq \mathcal{C}$`.

Let `$A_1 \subseteq A_2 \subseteq A_n \subseteq \ldots$` be a non-decreasing sequence of elements of `$\mathcal{C}$`, again by the very definition of measures,

`$$P(\cup_n A_n) = \lim_n \uparrow P(A_n) = \lim_n \uparrow Q(A_n) = Q(\cup_n A_n)$$`

Hence `$\mathcal{C}$` is closed by monotone limits.

---

---

### Definition: sigma-finite measures

A measure `$\mu$` on `$(\Omega, \mathcal{F})$` is `$\sigma$`-finite

iff

there exists `$(A_n)_n$` with `$\Omega \subseteq \cup_n A_n$` and `$\mu(A_n) < \infty$` for each `$n$`.

---

- Finite measures (this encompasses  probability measures) are `$\sigma$`-finite.

- Lebesgue measure is `$\sigma$`-finite.

- The counting measure on `$\mathbb{R}$` is not `$\sigma$`-finite.

???

In the statement of the monotone class theorem, what happens if we only assume that the two measures are `$\sigma$`-finite?

---

### Theorem (Monotone class lemma)

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

If 
- `$\mathcal{A}$` is a `$\pi$`-systen in `$\Omega$` and 
- `$\mathcal{M}$` a `$\lambda$`-system in `$\Omega$` such that `$\mathcal{A} \subseteq \mathcal{M}$`,

then

the  `$\sigma$`-algebra generated by `$\mathcal{A}$`, `$\sigma(A)$`, is the _smallest_ `$\lambda$`-system larger than `$\mathcal{A}$`:

`$$\sigma(\mathcal{A}) \subseteq \mathcal{M}$$`

]

---

### Proof

Let `$\mathcal{M}$` denote the intersection of all monotone classes that contain
the `$\pi$`-system `$\mathcal{A}$`.

As a `$\sigma$`-algebra is a monotone class (a `$\lambda$`-system), we have `$\mathcal{M} \subseteq \sigma(\mathcal{A})$`, the only point that
has to be checked is `$\sigma(\mathcal{A}) \subseteq \mathcal{M}$`.

It is enough to check
that `$\mathcal{M}$` is indeed a `$\sigma$`-algebra.

---

### Proof (continued)

In order to check that `$\mathcal{M}$` is a `$\sigma$`-algebra, it is enough to check that
it is closed under finite union or equivalently under finite intersection.

For each `$A \in \mathcal{A}$`, let `$\mathcal{M}_A$` be defined by

`$$\mathcal{M}_A = \Big\{ B : B \in \mathcal{M}, A \cap B \in \mathcal{M}\Big\}$$`

Remember that `$\mathcal{A}$` is a `$\pi$`-system, and `$\mathcal{A} \subseteq \mathcal{M}$`,
we have `$\mathcal{A} \subseteq \mathcal{M}_A$`.

To show that `$\mathcal{M} = \mathcal{M}_A$`, it suffices to show that `$\mathcal{M}_A$` is a monotone class.

If `$(B_n)_n$` is an increasing sequence of elements of `$\mathcal{M}_A$`, then

`$$(\cup_n B_n) \cap A = \cup_n \Big(\underbrace{B_n \cap A }_{\in \mathcal{M}}\Big)$$`

the right-hand-side belongs to `$\mathcal{M}$` since `$\mathcal{M}$` is monotone.

Hence
`$\mathcal{M}_A$` is closed by monotone increasing limit.

---

### Proof (continued)

To check closure by complementation, let `$B \subseteq C$` with `$B, C \in \mathcal{M}_A$`. As

`$$A \cap (C \setminus B) = \Big(\underbrace{A \cap C}_{\in \mathcal{M}}\Big) \setminus \Big(\underbrace{A \cap B}_{\in \mathcal{M}}\Big))$$`

the closure of `$\mathcal{M}$` under complementation entails `$A \cap (C \setminus B) \in \mathcal{M}$`
and `$C \setminus B  \in \mathcal{M}_A.$`

Now, let `$\mathcal{M}^{\circ}$` be defined as

`$$\mathcal{M}^\circ = \Big\{  A : A \in \mathcal{M}, \forall B \in \mathcal{M},   A \cap B \in \mathcal{M} \Big\}$$`

We just established that `$\mathcal{A} \subseteq \mathcal{M}^{\circ}$`.
Using the same line of reasoning allows us to check that `$\mathcal{M}^\circ$` is also a monotone class.
This shows that `$\mathcal{M}^\circ= \mathcal{M}$`.

---

Combining [Proposition](#usefulprop) and the Monotone Class Lemma
leads to the next useful corollary.

### Corollary

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

If two probabilities `$P, Q$` on `$(\Omega, \mathcal{F})$`
coincide on a  `$\pi$`-system `$\mathcal{A}$`
that generates `$\mathcal{F}$`:

`$$\mathcal{A} \subseteq \{ A : A \in \mathcal{F}  \text{ and } P(A)=Q(A)\} \qquad\text{and} \qquad \mathcal{F} \subseteq \sigma(\mathcal{A})$$`

then `$P, Q$` coincide on `$\mathcal{F}$`.

]

---
template: inter-slide
name: distributionsrealline

## Probability distributions on the real line

---

A probability distribution is a complex object: it maps a large collection of
sets (a `$\sigma$`-algebra) to `$[0,1]$`.

If we focus on probability distributions over `$(\mathbb{R}, \mathcal{B}(\mathbb{R}))$`, they can be characterized by real functions on `$\mathbb{R}$`.

---

### Definition: Distribution function

Given a probability distribution `$P$` on `$(\mathbb{R}, \mathcal{B}(\mathbb{R}))$`,
the distribution function
`$F$` of `$P$` maps `$\mathbb{R}$` to `$[0,1]$`, it is defined by

`$$x \mapsto F(x) = P(-\infty, x].$$`

---

A probability distribution defines a unique distribution function. What is perhaps surprising
is that a distribution function defines a unique probability distribution function.

### Proposition

.bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5.f6[

Let `$F$` be a function from `$\mathbb{R}$` to `$[0,1]$`.

The function `$F$` is the distribution function of a probability distribution on
`$(\mathbb{R}, \mathcal{B}(\mathbb{R}))$`,

iff

the following five properties are satisfied:

1. `$F$` is non-decreasing,
1. `$F$` is right-continuous
1. `$\lim_{y \nearrow x} F(y)$` exists at every `$x \in \mathbb{R}$` ( `$F$` has left-limits everywhere )
1. `$\lim_{x \to -\infty}F(x)=0$`
1. `$\lim_{x \to \infty} F(x)= 1.$`

]

---

### Cumulative distribution function of Poisson distributions for different values of the parameter

.fl.w-30[

For parameter `$\mu$`, `$F_{\mu}(x) = \sum_{k \leq x}  \mathrm{e}^{-\mu} \frac{\mu^k}{k!}$`.

Apparently, `$\mu \leq \nu \Rightarrow F_{\mu} \geq F_{\nu}$`.

]

.fl.w-70[

<img src="cm-4-measure-101_files/figure-html/distribpoisson-1.png" width="504" />
]

---
template: inter-slide
name: generalrv

## General random variables

---

A real random variable is neither a variable, nor random.
A real random variable is a measurable function from
some measurable space to the real line endowed with the
Borel `$\sigma$`-algebra. There is nothing random in a random variable.

### Definition

Given a measurable space `$(\Omega, \mathbb{F})$`,
a mapping `$X$` from `$\Omega$` to `$\mathbb{R}$` is a real valued
random variable such that for every `$B \in \mathcal{B}(\mathbb{R})$`
the inverse image of `$B$`:

`$$X^{-1}(B) = \{ \omega : \omega \in \Omega, X(\omega) \in {B}\}$$`

belongs to `$\mathcal{F}$`

---

Once a measurable space is endowed with a probability distribution, is it possible to define the (probability) distribution of a random variable.

---

### Definition

Given `$(\Omega, \mathcal{F}, P)$` and a real valued random variable `$X$`,
the law or probability distribution of `$X$`, denoted by `$P \circ X^{-1}$`, is the probability
distribution on `$(\mathbb{R}, \mathcal{B}(\mathbb{R}))$`  defined by

`$$(P \circ X^{-1})(B) =  P(X^{-1}(B)) \quad \text{for all } B \in \mathcal{B}(\mathbb{R})$$`

---

Random variables may be vector-valued, function-valued, _etc_.
General random variables are defined as measurable functions
between measurable spaces.

### Definition

Given two  measurable spaces `$(\Omega, \mathcal{F})$`, and `$(\Omega', \mathcal{G})$`
a mapping `$X$` from `$\Omega$` to `$\Omega'$` is a `$\mathcal{F}/\mathcal{G}$`-random variable if
for every `$B \in \mathcal{G}$` the inverse image of `$B$`:

`$$X^{-1}(B) = \{ \omega : \omega \in \Omega, X(\omega) \in {B}\}$$`

belongs to `$\mathcal{F}$`.

---
exclude: true
template: inter-slide

## Bibliographic remarks

---
exclude: true

There are many beautiful books on Probability Theory. They are targetted at
different audiences. Some may be more suited to the students of the dual curriculum
[Mathématiques-Informatique ](http://master.math.univ-paris-diderot.fr/annee/m1-mi/).
I found the following ones particularly useful.

[Youssef (2019)](https://www.lpsm.paris/pageperso/youssef/Enseignement_files/M1-proba-2018.pdf)
is a clear and concise collection of class notes designed for a Master I-level Probability course
that is substantially more ambitious than this minimal course.

@MR1932358 delivers a self-contained course on Analysis and Probability. The book can serve
both as an introduction and a reference book. Beyond cautious and transparent proofs, it
contains historical notes that help understand the connections between landmark results.

@MR1873379 introduces measure and integration theory to an audience that has been
exposed to discrete probability theory and that is familiar with probabilistic
reasoning.

---

class: middle, center, inverse

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: 112%

# The End