Probability II: Discrete distributions

---
name: layout-general
layout: true
class: left, middle

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 4px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: red;
}
</style>

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(./img/Universite_Paris_logo_horizontal.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'http://master.math.univ-paris-diderot.fr/annee/m1-mi/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---
template: inter-slide

# Probability II: Discrete distributions

### 2021-09-16

#### [Probability Master I MIDS](http://stephane-v-boucheron.fr/courses/probability)

#### [Stéphane Boucheron](http://stephane-v-boucheron.fr)

---
template: inter-slide
name: xxx

## <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 117.66v346.32c0 11.32 11.43 19.06 21.94 14.86L160 416V32L20.12 87.95A32.006 32.006 0 0 0 0 117.66zM192 416l192 64V96L192 32v384zM554.06 33.16L416 96v384l139.88-55.95A31.996 31.996 0 0 0 576 394.34V48.02c0-11.32-11.43-19.06-21.94-14.86z"/></svg>

### [Motivation](#motivation)

### [Binomial](#bernoulli)

### [Poisson](#poisson)

### [Geometric](#geometric)

???

---

## Motivation

---

The goal of this lesson is

- getting acquainted with important families of distributions and

- getting familiar with distributional calculus

Probability distributions will be presented through

- distribution functions,

- probability mass functions (discrete distribution)

- ...

---
template: inter-slide
name: bernoulli

## Bernoulli and Binomial

---

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M0 405.3V448c0 35.3 86 64 192 64s192-28.7 192-64v-42.7C342.7 434.4 267.2 448 192 448S41.3 434.4 0 405.3zM320 128c106 0 192-28.7 192-64S426 0 320 0 128 28.7 128 64s86 64 192 64zM0 300.4V352c0 35.3 86 64 192 64s192-28.7 192-64v-51.6c-41.3 34-116.9 51.6-192 51.6S41.3 334.4 0 300.4zm416 11c57.3-11.1 96-31.7 96-55.4v-42.7c-23.2 16.4-57.3 27.6-96 34.5v63.6zM192 160C86 160 0 195.8 0 240s86 80 192 80 192-35.8 192-80-86-80-192-80zm219.3 56.3c60-10.8 100.7-32 100.7-56.3v-42.7c-35.5 25.1-96.5 38.6-160.7 41.8 29.5 14.3 51.2 33.5 60 57.2z"/></svg> Definition

A Bernoulli distribution is a probability distribution `$P$`  on `$\Omega=\{0,1 \}$`

The _success parameter_ of `$P$` is `$P\{1\} \in [0,1]$`

A Bernoulli distribution is completely defined by its success parameter

---

### Definition  binomial

Assume `$\Omega^{\prime} = \{0,1\}^n$`

A binomial distribution with parameters `$n \in \mathbb{N}, p \in [0,1]$` ( `$n$` is _size_ and `$p$` is _success_ )
is a probability distribution `$P$` on

`$$\Omega = \{0, 1, 2, \ldots, n\}$$`,

defined by

`$$P\{k\} = \binom{n}{k} p^k (1-p)^k$$`

---

The connexion between Bernoulli and Binomial distributions is obvious: a Bernoulli distribution
is a Binomial distribution with size parameter equal to `$1$`.

This connexion goes further: the sum of _independent_ Bernoulli random  variables with _same_ success parameter is Binomially distributed

### Proposition

Let `$X_1, X_2, \ldots, X_n$` be _independent_, identically distributed Bernoulli random variables with
_success_ parameter `$p \in [0,1]$`,

then

`$$Y = \sum_{i=1}^n X_i$$`

is distributed according to a Binomial disctribution with _size_ parameter `$n$` and _success_ probability `$p$`:

`$$Y \sim \operatorname{Bin}(n,p)$$`

---

### Proof

For `$k \in 0, \ldots, n$`
`$$\begin{array}{rl}
P\Big\{ \sum_{i=1}^n  X_i = k \Big\}
  & = \sum_{x_1, \ldots, x_n \in \{0,1 \}^n} \mathbb{I}_{\sum_{i=1}^n x_i=k} P \Big\{ \wedge_{i=1}^n X_i = x_i\Big\} \\
  & = \sum_{x_1, \ldots, x_n \in \{0,1 \}^n} \mathbb{I}_{\sum_{i=1}^n x_i=k} \prod_{i=1}^n P \Big\{ X_i = x_i\Big\} \\
  & = \sum_{x_1, \ldots, x_n \in \{0,1 \}^n} \mathbb{I}_{\sum_{i=1}^n x_i=k} \prod_{i=1}^n  p^{x_i} (1-p)^{1-x_i} \\
  & = \sum_{x_1, \ldots, x_n \in \{0,1 \}^n} \mathbb{I}_{\sum_{i=1}^n x_i=k}\,   p^{k} (1-p)^{n-k} \\
  & = \binom{n}{k} p^{k} (1-p)^{n-k}
\end{array}$$`

---

This observation facilitates the computation of moments of Binomial distribution <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M639.4 433.6c-8.4-20.4-31.8-30.1-52.2-21.6l-22.1 9.2-38.7-101.9c47.9-35 64.8-100.3 34.5-152.8L474.3 16c-8-13.9-25.1-19.7-40-13.6L320 49.8 205.7 2.4c-14.9-6.2-32-.3-40 13.6L79.1 166.5C48.9 219 65.7 284.3 113.6 319.2L74.9 421.1l-22.1-9.2c-20.4-8.5-43.7 1.2-52.2 21.6-1.7 4.1.2 8.8 4.3 10.5l162.3 67.4c4.1 1.7 8.7-.2 10.4-4.3 8.4-20.4-1.2-43.8-21.6-52.3l-22.1-9.2L173.3 342c4.4.5 8.8 1.3 13.1 1.3 51.7 0 99.4-33.1 113.4-85.3l20.2-75.4 20.2 75.4c14 52.2 61.7 85.3 113.4 85.3 4.3 0 8.7-.8 13.1-1.3L506 445.6l-22.1 9.2c-20.4 8.5-30.1 31.9-21.6 52.3 1.7 4.1 6.4 6 10.4 4.3L635.1 444c4-1.7 6-6.3 4.3-10.4zM275.9 162.1l-112.1-46.5 36.5-63.4 94.5 39.2-18.9 70.7zm88.2 0l-18.9-70.7 94.5-39.2 36.5 63.4-112.1 46.5z"/></svg>

- <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M512 199.652c0 23.625-20.65 43.826-44.8 43.826h-99.851c16.34 17.048 18.346 49.766-6.299 70.944 14.288 22.829 2.147 53.017-16.45 62.315C353.574 425.878 322.654 448 272 448c-2.746 0-13.276-.203-16-.195-61.971.168-76.894-31.065-123.731-38.315C120.596 407.683 112 397.599 112 385.786V214.261l.002-.001c.011-18.366 10.607-35.889 28.464-43.845 28.886-12.994 95.413-49.038 107.534-77.323 7.797-18.194 21.384-29.084 40-29.092 34.222-.014 57.752 35.098 44.119 66.908-3.583 8.359-8.312 16.67-14.153 24.918H467.2c23.45 0 44.8 20.543 44.8 43.826zM96 200v192c0 13.255-10.745 24-24 24H24c-13.255 0-24-10.745-24-24V200c0-13.255 10.745-24 24-24h48c13.255 0 24 10.745 24 24zM68 368c0-11.046-8.954-20-20-20s-20 8.954-20 20 8.954 20 20 20 20-8.954 20-20z"/></svg> The _expected value/expectation_ of a Bernoulli distribution with parameter `$p$` is `$p$`

- By _linearity of expectation_, the expected value of the binomial distribution with parameters `$n$` and `$p$` is `$n \times p$` <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M639.4 433.6c-8.4-20.4-31.8-30.1-52.2-21.6l-22.1 9.2-38.7-101.9c47.9-35 64.8-100.3 34.5-152.8L474.3 16c-8-13.9-25.1-19.7-40-13.6L320 49.8 205.7 2.4c-14.9-6.2-32-.3-40 13.6L79.1 166.5C48.9 219 65.7 284.3 113.6 319.2L74.9 421.1l-22.1-9.2c-20.4-8.5-43.7 1.2-52.2 21.6-1.7 4.1.2 8.8 4.3 10.5l162.3 67.4c4.1 1.7 8.7-.2 10.4-4.3 8.4-20.4-1.2-43.8-21.6-52.3l-22.1-9.2L173.3 342c4.4.5 8.8 1.3 13.1 1.3 51.7 0 99.4-33.1 113.4-85.3l20.2-75.4 20.2 75.4c14 52.2 61.7 85.3 113.4 85.3 4.3 0 8.7-.8 13.1-1.3L506 445.6l-22.1 9.2c-20.4 8.5-30.1 31.9-21.6 52.3 1.7 4.1 6.4 6 10.4 4.3L635.1 444c4-1.7 6-6.3 4.3-10.4zM275.9 162.1l-112.1-46.5 36.5-63.4 94.5 39.2-18.9 70.7zm88.2 0l-18.9-70.7 94.5-39.2 36.5 63.4-112.1 46.5z"/></svg>

- <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M512 199.652c0 23.625-20.65 43.826-44.8 43.826h-99.851c16.34 17.048 18.346 49.766-6.299 70.944 14.288 22.829 2.147 53.017-16.45 62.315C353.574 425.878 322.654 448 272 448c-2.746 0-13.276-.203-16-.195-61.971.168-76.894-31.065-123.731-38.315C120.596 407.683 112 397.599 112 385.786V214.261l.002-.001c.011-18.366 10.607-35.889 28.464-43.845 28.886-12.994 95.413-49.038 107.534-77.323 7.797-18.194 21.384-29.084 40-29.092 34.222-.014 57.752 35.098 44.119 66.908-3.583 8.359-8.312 16.67-14.153 24.918H467.2c23.45 0 44.8 20.543 44.8 43.826zM96 200v192c0 13.255-10.745 24-24 24H24c-13.255 0-24-10.745-24-24V200c0-13.255 10.745-24 24-24h48c13.255 0 24 10.745 24 24zM68 368c0-11.046-8.954-20-20-20s-20 8.954-20 20 8.954 20 20 20 20-8.954 20-20z"/></svg> The variance of a sum of independent random variables  is the sum of the variances

- <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M512 199.652c0 23.625-20.65 43.826-44.8 43.826h-99.851c16.34 17.048 18.346 49.766-6.299 70.944 14.288 22.829 2.147 53.017-16.45 62.315C353.574 425.878 322.654 448 272 448c-2.746 0-13.276-.203-16-.195-61.971.168-76.894-31.065-123.731-38.315C120.596 407.683 112 397.599 112 385.786V214.261l.002-.001c.011-18.366 10.607-35.889 28.464-43.845 28.886-12.994 95.413-49.038 107.534-77.323 7.797-18.194 21.384-29.084 40-29.092 34.222-.014 57.752 35.098 44.119 66.908-3.583 8.359-8.312 16.67-14.153 24.918H467.2c23.45 0 44.8 20.543 44.8 43.826zM96 200v192c0 13.255-10.745 24-24 24H24c-13.255 0-24-10.745-24-24V200c0-13.255 10.745-24 24-24h48c13.255 0 24 10.745 24 24zM68 368c0-11.046-8.954-20-20-20s-20 8.954-20 20 8.954 20 20 20 20-8.954 20-20z"/></svg> The variance of the binomial distribution with parameters `$n$` and `$p$` is `$n \times p(1-p)$`

---

### Binomial probability mass functions with `$n=20$` and different values of `$p$` : `$.5, .7, .2$`.

More on [wikipedia](https://en.wikipedia.org/wiki/Binomial_distribution).

---

### Binomial distributions with the same success parameter

### Proposition

Let `$X,Y$` be

- independent over probability space `$(\Omega, \mathcal{F}, P)$` and 
- distributed according to `$\text{Bin}(n_1, p)$` and `$\text{Bin}(n_2, p)$`

then

`$X+Y$` is distributed according to `$\text{Bin}(n_1+n_2, p)$`

`$$X \bot\!\!\!\bot Y, \quad X \sim\operatorname{Bin}(n_1, p), \quad Y \sim\operatorname{Bin}(n_2, p) \Rightarrow X+ Y \sim\operatorname{Bin}(n_1+n_2, p)$$`

---

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M208 0c-29.9 0-54.7 20.5-61.8 48.2-.8 0-1.4-.2-2.2-.2-35.3 0-64 28.7-64 64 0 4.8.6 9.5 1.7 14C52.5 138 32 166.6 32 200c0 12.6 3.2 24.3 8.3 34.9C16.3 248.7 0 274.3 0 304c0 33.3 20.4 61.9 49.4 73.9-.9 4.6-1.4 9.3-1.4 14.1 0 39.8 32.2 72 72 72 4.1 0 8.1-.5 12-1.2 9.6 28.5 36.2 49.2 68 49.2 39.8 0 72-32.2 72-72V64c0-35.3-28.7-64-64-64zm368 304c0-29.7-16.3-55.3-40.3-69.1 5.2-10.6 8.3-22.3 8.3-34.9 0-33.4-20.5-62-49.7-74 1-4.5 1.7-9.2 1.7-14 0-35.3-28.7-64-64-64-.8 0-1.5.2-2.2.2C422.7 20.5 397.9 0 368 0c-35.3 0-64 28.6-64 64v376c0 39.8 32.2 72 72 72 31.8 0 58.4-20.7 68-49.2 3.9.7 7.9 1.2 12 1.2 39.8 0 72-32.2 72-72 0-4.8-.5-9.5-1.4-14.1 29-12 49.4-40.6 49.4-73.9z"/></svg>

Check the preceding proposition.

---
template: inter-slide
name: poisson

## Poisson

---

The Poisson distribution appears as a limit of Binomial distributions in a variety
of circumstances connected to _rare events phenomena_

### Definition

A Poisson distribution with parameter `$\lambda >0$` is a probability distribution `$P$`  on `$\Omega=\mathbb{N}$` with

`$$P\{k\} = \mathrm{e}^{-\lambda} \frac{\lambda^k}{k!}$$`

---

### Poisson probability mass functions with different values of parameter: `$1, 5, 10$`.

Recall that the parameter of a Poisson distribution equals its expectation and its variance.

The probability mass function of a Poisson distribution achieves its maximum (called the mode) close to its expectation.

---

- The expected value of the Poisson distribution with  paramenter `$\lambda$` is `$\lambda$`.

- The variance of a Poisson distribution is equal to its expected value.

`$$\begin{array}{rl}
\mathbb{E} X
& = \sum_{n=0}^\infty \mathrm{e}^{-\lambda} \frac{\lambda^n}{n!} \times n\\
& = \lambda \times \sum_{n=1}^\infty \mathrm{e}^{-\lambda} \frac{\lambda^{n-1}}{(n-1)!}  \\
& = \lambda \, .
\end{array}$$`

---

### Proposition

Let `$X,Y$` be independent and Poisson distributed over probability space `$(\Omega, \mathcal{F}, P)$`,

then

`$X+Y$` is Poisson distributed

---

### Proof

`$X \sim \operatorname{Po}(\lambda), X \bot\!\!\!\bot Y, Y  \sim \operatorname{Po}(\mu)$`. For each `$k \in \mathbb{N}$`:

`$$\begin{array}{rl}
\Pr \{ X+Y =k\}
& = \Pr \{ \bigvee_{m=0}^k (X =m \wedge Y =k-m) \} \\
& = \sum_{m=0}^k \Pr \{ X =m \wedge Y =k-m \} \\
& = \sum_{m=0}^k \Pr \{ X =m \} \times \Pr\{ Y =k-m \} \\
& = \sum_{m=0}^k \mathrm{e}^{-\lambda} \frac{\lambda^m}{m!} \mathrm{e}^{-\mu} \frac{\mu^{k-m}}{(k-m)!} \\
& = \mathrm{e}^{-\lambda - \mu}  \frac{(\lambda+\mu)^k}{k!} \sum_{m=0}^k \frac{k!}{m! (k-m)!}\left(\frac{\lambda}{\lambda+\mu}\right)^m \left(\frac{\mu}{\lambda+\mu}\right)^{k-m} \\
& = \mathrm{e}^{-\lambda - \mu}  \frac{(\lambda+\mu)^k}{k!} \sum_{m=0}^k \binom{k}{m}\left(\frac{\lambda}{\lambda+\mu}\right)^m \left(\frac{\mu}{\lambda+\mu}\right)^{k-m} \\
& = \mathrm{e}^{-\lambda - \mu}  \frac{(\lambda+\mu)^k}{k!} \left( \frac{\lambda}{\lambda+\mu} + \frac{\mu}{\lambda+\mu}\right)^k  \\
& = \mathrm{e}^{-(\lambda + \mu)}  \frac{(\lambda+\mu)^k}{k!}
\end{array}$$`

The last expression if the pmf of `$\operatorname{Po}(\lambda + \mu)$` at `$k$`

---

Check that the _mode_ (maximum) of a Poisson probability mass function with parameter `$\lambda$`
is achieved at `$k= \lfloor \lambda \rfloor$`

It is always unique?

Check that the _median_ of a Poisson distribution with integer parameter `$\lambda$` is not smaller than `$\lambda$`

---
template: inter-slide
name: geometric

## Geometric

---

A geometric distribution is a probability distribution over `$\mathbb{N} \subset \{0,1\}$`. It depends
on a parameter `$p>0$`.

Assume we are allowed to toss a biased coin infinitely many times.

The number of times we have to toss the coin _until_ we get a _head_ is geometrically distributed.

---

Let `$X$` be distributed according to a geometric distribution with parameter `$p$`.

The geometric probability distribution is easily defined by its _tail function_

In the event `$X>k$`, the first `$k$` outcomes have to be _tail_.

`$$P \{ X > k \} = (1-p)^k$$`

The probability mass function of the geometric distribution follows:

`$$P \{X = k \} = (1-p)^{k-1} - (1-p)^k = p \times (1-p)^{k-1} \qquad \text{for } k=1, 2, \ldots$$`

On average, we have to toss the coin `$p$` times until we get a _head_:

`$$\mathbb{E}X = \sum_{k=0}^\infty P \{ X > k \} = \frac{1}{p}$$`

---

### <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M569.517 440.013C587.975 472.007 564.806 512 527.94 512H48.054c-36.937 0-59.999-40.055-41.577-71.987L246.423 23.985c18.467-32.009 64.72-31.951 83.154 0l239.94 416.028zM288 354c-25.405 0-46 20.595-46 46s20.595 46 46 46 46-20.595 46-46-20.595-46-46-46zm-43.673-165.346l7.418 136c.347 6.364 5.609 11.346 11.982 11.346h48.546c6.373 0 11.635-4.982 11.982-11.346l7.418-136c.375-6.874-5.098-12.654-11.982-12.654h-63.383c-6.884 0-12.356 5.78-11.981 12.654z"/></svg>

It is also possible to define geometric random variables as
the number of times we have to toss the coin __before__ we get a _head_.

This requires modifying quantile function, probability mass function, expectation, and so on.

This is the convention <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> uses.

---

### Geometric probability mass functions

with different values of parameter `$p$`: `$1/2, 1/3, 1/5$`. The probability mass function equals `$p \times (1-p)^{k-1}$` at `$k\geq 1$`. The mode is achieved at `$k=1$` whatever the value of `$p$`. The expectation equals `$1/p$`.

---

Sums of independent geometric random variables are not distributed according to a geometric distribution.

---

background-image: url('./img/pexels-cottonbro-3171837.jpg')
background-size: 112%

# The End