name: layout-general layout: true class: left, middle <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: red; } </style>
--- class: middle, center, inverse background-size: 4% background-position: 97% 3% # Product distributions ### 2021-01-08 #### [Probabilités Master I MIDS](http://stephane-v-boucheron.fr/courses/probability/) ### [Stéphane Boucheron](http://stephane-v-boucheron.fr) --- class: inverse, center, middle ## Motivation ## <svg style="height:0.8em;top:.04em;position:relative;fill:white;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm80 256c-60.6 0-134.5-38.3-143.8-93.3-2-11.8 9.3-21.6 20.7-17.9C155.1 330.5 200 336 248 336s92.9-5.5 123.1-15.2c11.3-3.7 22.6 6.1 20.7 17.9-9.3 55-83.2 93.3-143.8 93.3z"/></svg> --- ### <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M496 384H64V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-32c0-8.84-7.16-16-16-16zM464 96H345.94c-21.38 0-32.09 25.85-16.97 40.97l32.4 32.4L288 242.75l-73.37-73.37c-12.5-12.5-32.76-12.5-45.25 0l-68.69 68.69c-6.25 6.25-6.25 16.38 0 22.63l22.62 22.62c6.25 6.25 16.38 6.25 22.63 0L192 237.25l73.37 73.37c12.5 12.5 32.76 12.5 45.25 0l96-96 32.4 32.4c15.12 15.12 40.97 4.41 40.97-16.97V112c.01-8.84-7.15-16-15.99-16z"/></svg> Description of _random walks_ over `\(\mathbb{Z}^d\)` : > at each step, we chose a random neighbour of the current position and move to that neighbour. -- An elementary move is an element of `\(\{0, \pm 1\}^d\)` where one component is non-zero -- Picking an elementary move uniformly at random is easy -- Picking finitely many independent moves is easy too -- Enough to have an .red[infinite] supply of independent move-valued random variables --- ### <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M501.1 395.7L384 278.6c-23.1-23.1-57.6-27.6-85.4-13.9L192 158.1V96L64 0 0 64l96 128h62.1l106.6 106.6c-13.6 27.8-9.2 62.3 13.9 85.4l117.1 117.1c14.6 14.6 38.2 14.6 52.7 0l52.7-52.7c14.5-14.6 14.5-38.2 0-52.7zM331.7 225c28.3 0 54.9 11 74.9 31l19.4 19.4c15.8-6.9 30.8-16.5 43.8-29.5 37.1-37.1 49.7-89.3 37.9-136.7-2.2-9-13.5-12.1-20.1-5.5l-74.4 74.4-67.9-11.3L334 98.9l74.4-74.4c6.6-6.6 3.4-17.9-5.7-20.2-47.4-11.7-99.6.9-136.6 37.9-28.5 28.5-41.9 66.1-41.2 103.6l82.1 82.1c8.1-1.9 16.5-2.9 24.7-2.9zm-103.9 82l-56.7-56.7L18.7 402.8c-25 25-25 65.5 0 90.5s65.5 25 90.5 0l123.6-123.6c-7.6-19.9-9.9-41.6-5-62.7zM64 472c-13.2 0-24-10.8-24-24 0-13.3 10.7-24 24-24s24 10.7 24 24c0 13.2-10.7 24-24 24z"/></svg> We need a building device that allows us to glue probability spaces together so as to get stochastically independent components in the resulting probability space We need tools to perform computations in the resulting probability space --- class: center, middle, inverse ## Product of two probability spaces --- ## Product `\(\sigma\)`-algebra Given: two measured spaces `$$(\mathcal{X}, \mathcal{F}, \mu) \quad \text{ and } \quad (\mathcal{Y}, \mathcal{G}, \nu)$$` ### Goal Build a measure space `\((\mathcal{X}\times \mathcal{Y}, \mathcal{H}, \rho)\)` and two measurable functions `\(X : \mathcal{X}\times \mathcal{Y} \to \mathcal{X}\)` and `\(Y : \mathcal{X}\times \mathcal{Y} \to \mathcal{Y}\)` with - `\(\mu = \rho \circ X^{-1}\)` and `\(\nu = \rho \circ Y^{-1}\)` - `\(\rho(A \times B) = \mu(A) \times \nu(B) \qquad \forall A \in \mathcal{F}, B\in \mathcal{G} \,.\)` -- <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M501.1 395.7L384 278.6c-23.1-23.1-57.6-27.6-85.4-13.9L192 158.1V96L64 0 0 64l96 128h62.1l106.6 106.6c-13.6 27.8-9.2 62.3 13.9 85.4l117.1 117.1c14.6 14.6 38.2 14.6 52.7 0l52.7-52.7c14.5-14.6 14.5-38.2 0-52.7zM331.7 225c28.3 0 54.9 11 74.9 31l19.4 19.4c15.8-6.9 30.8-16.5 43.8-29.5 37.1-37.1 49.7-89.3 37.9-136.7-2.2-9-13.5-12.1-20.1-5.5l-74.4 74.4-67.9-11.3L334 98.9l74.4-74.4c6.6-6.6 3.4-17.9-5.7-20.2-47.4-11.7-99.6.9-136.6 37.9-28.5 28.5-41.9 66.1-41.2 103.6l82.1 82.1c8.1-1.9 16.5-2.9 24.7-2.9zm-103.9 82l-56.7-56.7L18.7 402.8c-25 25-25 65.5 0 90.5s65.5 25 90.5 0l123.6-123.6c-7.6-19.9-9.9-41.6-5-62.7zM64 472c-13.2 0-24-10.8-24-24 0-13.3 10.7-24 24-24s24 10.7 24 24c0 13.2-10.7 24-24 24z"/></svg> We have to define a convenient `\(\sigma\)`-algebra `\(\mathcal{H}\)` of subsets of `\(\mathcal{X} \times \mathcal{Y}\)`: the _product `\(\sigma\)`-algebra_ --- .content-box-gray[ ### Definition: Product sigma-algebra Let `\((\mathcal{X}, \mathcal{F})\)` and `\((\mathcal{Y}, \mathcal{G})\)` be two measurable spaces. The product `\(\sigma\)`-algebra `\(\mathcal{F} \otimes \mathcal{G}\)` is the `\(\sigma\)`-algebra of subsets of `\(2^{\mathcal{X} \times \mathcal{Y}}\)` that is generated by the so-called _rectangles_: `$$\Big\{ A \times B : A \in \mathcal{F}, B \in \mathcal{G}\Big\} \, .$$` ] -- <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 192 512"><path d="M176 432c0 44.112-35.888 80-80 80s-80-35.888-80-80 35.888-80 80-80 80 35.888 80 80zM25.26 25.199l13.6 272C39.499 309.972 50.041 320 62.83 320h66.34c12.789 0 23.331-10.028 23.97-22.801l13.6-272C167.425 11.49 156.496 0 142.77 0H49.23C35.504 0 24.575 11.49 25.26 25.199z"/></svg> The product `\(\sigma\)`-algebra makes the functions `\(X\)` and `\(Y\)` (sometimes called _coordinate projections_) measurable. --- #### Example If `\(\mathcal{F} = \mathcal{G} =\mathcal{B}(\mathbb{R})\)` `$$\mathcal{F} \otimes \mathcal{G} = \sigma\left(A \times B : A, B \in \mathcal{B}(\mathbb{R})\right)$$` -- `$$\mathcal{B}(\mathbb{R}) \otimes \mathcal{B}(\mathbb{R}) = \mathcal{B}\left(\mathbb{R}^2 \right)$$` --- ### More generally If `\(\mathcal{F} = \sigma(\mathcal{A})\)` (resp. `\(\mathcal{G} = \sigma(\mathcal{B})\)` ) with `\(\mathcal{A}\)` (resp. `\(\mathcal{B}\)` ) a `\(\pi\)` -class Then `$$\mathcal{F} \otimes \mathcal{G} = \sigma\left(\mathcal{A}\right) \otimes \sigma\left(\mathcal{B}\right) = \sigma\left(\mathcal{A} \times \mathcal{B}\right)$$` --- ### Recall 💉 .content-box-gray[ ### Definition: A measure `\(\mu\)` on `\((\Omega, \mathcal{F})\)` is `\(\sigma\)`-finite iff there exists `\((A_n)_n\)` with - `\(\Omega \subseteq \cup_n A_n\)` - `\(\mu(A_n) < \infty\)` for each `\(n\)`. ] -- - Finite measures (this encompasses probability measures) are `\(\sigma\)`-finite <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg> - Lebesgue measure is `\(\sigma\)`-finite <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg> - The counting measure on `\(\mathbb{R}\)` is __not__ `\(\sigma\)`-finite <svg style="height:0.8em;top:.04em;position:relative;fill:gray;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm170.2 218.2C315.8 367.4 282.9 352 248 352s-67.8 15.4-90.2 42.2c-13.5 16.3-38.1-4.2-24.6-20.5C161.7 339.6 203.6 320 248 320s86.3 19.6 114.7 53.8c13.6 16.2-11 36.7-24.5 20.4z"/></svg> --- .content-box-gray[ ### Product-measure Theorem Let `\((\mathcal{X}, \mathcal{F}, \mu)\)` and `\((\mathcal{Y}, \mathcal{G}, \nu)\)` be two measured spaces where `\(\mu,\nu\)` are `\(\sigma\)`-finite. Then there exists a .red[unique] `\(\sigma\)`-finite measure `\(\alpha\)` on `\(\mathcal{X} \times \mathcal{Y}\)` endowed with the product `\(\sigma\)`-algebra `\(\mathcal{F} \otimes \mathcal{G} = \sigma(\mathcal{F} \times \mathcal{G})\)` that satisfies `$$\alpha (A \times B) = \mu(A) \times \nu(B)\qquad \forall A \in \mathcal{F}, B \in \mathcal{G} \, .$$` ... (to be continued) ] --- .content-box-gray[ ### Theorem (continued) Moreover, for all `\(E \in \mathcal{F} \otimes \mathcal{G}\)`, 1. for each `\(x \in \mathcal{X}\)`, `\(y \mapsto \mathbb{I}_E(x,y)\)` is `\(\mathcal{G}\)`-measurable; 1. `\(x \mapsto \int_{\mathcal{Y}} \mathbb{I}_E(x,y) \, \mathrm{d} \nu(y)\)` is `\(\mathcal{F}\)`-measurable; 1. for each `\(y \in \mathcal{Y}\)`, `\(x \mapsto \mathbb{I}_E(x,y)\)` is `\(\mathcal{F}\)`-measurable; 1. `\(y \mapsto \int_{\mathcal{X}} \mathbb{I}_E(x,y) \, \mathrm{d}\mu(x)\)` is `\(\mathcal{G}\)`-measurable, and the following holds: `$$\int_{\mathcal{X}\times \mathcal{Y}} \mathbb{I}_E \, \mathrm{d}\alpha = \int_{\mathcal{X}} \Big(\int_{\mathcal{Y}} \mathbb{I}_E(x,y) \, \mathrm{d} \nu(y)\Big) \, \mathrm{d}\mu(x) = \int_{\mathcal{Y}} \Big( \int_{\mathcal{X}} \mathbb{I}_E(x,y) \, \mathrm{d}\mu(x)\Big) \,\mathrm{d} \nu(y)$$` where the three integrals are either finite or infinite. ] .blue[ Measure `\(α\)` is called a _product measure_, denoted by `\(\mu \otimes \nu\)`. ] ??? --- ### <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M439.15 453.06L297.17 384l141.99-69.06c7.9-3.95 11.11-13.56 7.15-21.46L432 264.85c-3.95-7.9-13.56-11.11-21.47-7.16L224 348.41 37.47 257.69c-7.9-3.95-17.51-.75-21.47 7.16L1.69 293.48c-3.95 7.9-.75 17.51 7.15 21.46L150.83 384 8.85 453.06c-7.9 3.95-11.11 13.56-7.15 21.47l14.31 28.63c3.95 7.9 13.56 11.11 21.47 7.15L224 419.59l186.53 90.72c7.9 3.95 17.51.75 21.47-7.15l14.31-28.63c3.95-7.91.74-17.52-7.16-21.47zM150 237.28l-5.48 25.87c-2.67 12.62 5.42 24.85 16.45 24.85h126.08c11.03 0 19.12-12.23 16.45-24.85l-5.5-25.87c41.78-22.41 70-62.75 70-109.28C368 57.31 303.53 0 224 0S80 57.31 80 128c0 46.53 28.22 86.87 70 109.28zM280 112c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32zm-112 0c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32z"/></svg> ### Assuming that both `\(μ\)` and `\(ν\)` are .red[ `\(σ\)` -finite] is essential! -- Choose - `\(\mu\)` as the counting measure on `\([0,1]\)` and - `\(\nu\)` as the Lebesgue measure on `\([0,1]\)`. Consider the diagonal `\(E = \{(x,x) : x \in [0,1]\}\)`. -- The set `\(E\)` belongs to `\(\mathcal{B}(\mathbb{R}) \otimes \mathcal{B}(\mathbb{R}) = \mathcal{B}(\mathbb{R}^2)\)` `$$μ ⊗ ν(E) = ???$$` --- ### <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M439.15 453.06L297.17 384l141.99-69.06c7.9-3.95 11.11-13.56 7.15-21.46L432 264.85c-3.95-7.9-13.56-11.11-21.47-7.16L224 348.41 37.47 257.69c-7.9-3.95-17.51-.75-21.47 7.16L1.69 293.48c-3.95 7.9-.75 17.51 7.15 21.46L150.83 384 8.85 453.06c-7.9 3.95-11.11 13.56-7.15 21.47l14.31 28.63c3.95 7.9 13.56 11.11 21.47 7.15L224 419.59l186.53 90.72c7.9 3.95 17.51.75 21.47-7.15l14.31-28.63c3.95-7.91.74-17.52-7.16-21.47zM150 237.28l-5.48 25.87c-2.67 12.62 5.42 24.85 16.45 24.85h126.08c11.03 0 19.12-12.23 16.45-24.85l-5.5-25.87c41.78-22.41 70-62.75 70-109.28C368 57.31 303.53 0 224 0S80 57.31 80 128c0 46.53 28.22 86.87 70 109.28zM280 112c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32zm-112 0c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32z"/></svg> .content-box-red[ Interchanging the order of integration leads to different results: `\begin{align*} 1 & = \int_{[0,1]} \Big(\underbrace{\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\mu(x)}_{=1}\Big) \, \mathrm{d}\nu(y) \\ 0 & = \int_{[0,1]} \Big(\underbrace{\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\nu(y)}_{=0}\Big) \, \mathrm{d}\mu(x) \end{align*}` ] --- #### The product-measure theorem contains three statements: - existence of a measure over `\((\mathcal{X} \times \mathcal{Y}, \mathcal{F} \otimes \mathcal{G})\)` that satisfies the product property over rectangles - uniqueness of this measure - the possibility of computing the measure of `\(E \in \mathcal{F} \otimes \mathcal{G}\)` by iterated integration in arbitrary order. --- - The first statement is proved using an extension theorem -- - The second statement follows from a monotone class argument (rectangles form a generating `\(\pi\)`-class) - the case where both `\(\mu\)` and `\(\nu\)` are finite measure is settled - If either `\(\mu\)` or `\(\nu\)` is just `\(\sigma\)`-finite, consider restrictions to rectangles with finite measure, and proceed by approximation. -- - The third statement trivially holds for rectangles. --- --- class: center, middle, inverse ## Tonelli-Fubini theorem --- <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 192 512"><path d="M176 432c0 44.112-35.888 80-80 80s-80-35.888-80-80 35.888-80 80-80 80 35.888 80 80zM25.26 25.199l13.6 272C39.499 309.972 50.041 320 62.83 320h66.34c12.789 0 23.331-10.028 23.97-22.801l13.6-272C167.425 11.49 156.496 0 142.77 0H49.23C35.504 0 24.575 11.49 25.26 25.199z"/></svg> We consider product measures that are built from `\(\sigma\)`-finite measures -- > The Tonelli-Fubini Theorem shows that (under mild conditions) integration with respect to a product measure reduces to iterated integration over the component measures --- .content-box-gray[ ### Theorem Tonelli-Fubini Let `\(( \mathcal{X}, \mathcal{A})\)` and `\((\mathcal{Y}, \mathcal{B})\)` ne two measurable spaces, `\(\mu\)` and `\(\nu\)` two `\(\sigma\)`-finite measures on these spaces, `\(\mu \otimes \nu\)` the product measure, and `\(f\)` a `\(\mathcal{A} \otimes \mathcal{B}\)`-measurable real function such as `\(\int |f| \mathrm{d} \mu \otimes \nu < 0\)`. The following properties are satisfied: i. `\(\forall x \in \mathcal{X}, \hspace{1em} y \mapsto f (x, y)\)` is `\(\mathcal{B}\)`-measurable. i. The function `\(x \mapsto \int_{\mathcal{Y}} f (x, y) \mathrm{d}\nu(y)\)` is `\(\mathcal{A}\)`-measurable, finite `\(\mu\)`- almost everywhere and `$$\int_{\mathcal{X} \times \mathcal{Y}} f \mathrm{d} \mu \otimes \nu = \int_{\mathcal{X}} \left[ \int_{\mathcal{Y}} f (x, y) \mathrm{d} \nu (y) \right] \mathrm{d} \mu (x)$$` ] --- #### Proof --- #### A simple consequence of the Tonelli-Fubini Theorem. .content-box-gray[ ### Proposition "IPP formula" Let `\(X\)` be a non-negative real-valued random variable, then `$$\mathbb{E}X = \int_0^\infty P\{ X > t \} \mathrm{d}t$$` ] --- #### Proof `\begin{align*} \mathbb{E}X & = \int_{\Omega} X(\omega) \, \mathrm{d}P(\omega) \\ & = \int_{\Omega} \Big( \int_{[0,\infty)} \mathbb{I}_{X(\omega)> t} \mathrm{d}t \Big)\, \mathrm{d}P(\omega) \\ & = \int_{[0,\infty)} \Big( \int_{\Omega} \mathbb{I}_{X(\omega)> t} \, \mathrm{d}P(\omega) \Big) \mathrm{d}t \\ & = \int_{[0,\infty)} \Big( P\{ \omega : X(\omega) > t \} \Big) \mathrm{d}t \end{align*}` --- class: center, middle, inverse ## Independence and product distributions --- ### Two random variables Let the two random variables `\(X, Y\)` map `\((\Omega, \mathcal{F})\)` to `\((\mathcal{X}, \mathcal{G})\)` and `\((\mathcal{Y}, \mathcal{H})\)`. Equip `\((\Omega, \mathcal{F})\)` with probability distribution `\(P\)`. Let `\(Q_X = P \circ X^{-1}\)` and `\(Q_Y = P \circ Y^{-1}\)` be the two image distributions (called the marginal distributions). Let `\(Q\)` be the joint distribution of `\((X,Y)\)` under `\(P\)`, that is the probability distribution over `\(\mathcal{X} \times \mathcal{Y}\)` that is uniquely defined by `$$Q( A \times B) = P\Big\{ \omega: X(\omega) \in A, Y(\omega) \in B \Big\}$$` -- .content-box-gray[ Then `$$X \perp\!\!\!\perp Y \text{ under } P \Longleftrightarrow Q = Q_X \otimes Q_Y \, ,$$` In words, `\(X\)` and `\(Y\)` are independent iff their joint distribution is the product of their marginal distributions. ] --- ### 💉 Independence of finitely many `\(\sigma\)`-algebras Let `\((\Omega, \mathcal{F}, P)\)` be a probability space. Let `\(\mathcal{G_1}, \ldots, \mathcal{G}_n\)` be a collection of sub- `\(\sigma\)` -algebras .content-box-gray[ ### Definition This collection is independent with respect to `\(P\)` if `\(∀ A_1 ∈ \mathcal{G}_1, …, A_n ∈ \mathcal{G}_n\)` `$$P (A_1 ∩ … ∩ A_n) = P(A_1) \times \ldots \times P(A_n)$$` ] --- ### Independence of countably many `\(\sigma\)`-algebras In many applications, independence between two `\(\sigma\)`-algebras or a finite collection of `\(\sigma\)`-algebras is not enough. This is the case when deriving or using laws of large numbers. We have to deal with a _countable collection of independent random variables_. In words, we have to work with a countable collection of `\(\sigma\)`-algebras and we need to elaborate a notion of a countable collection of independent `\(\sigma\)`-algebras. --- .content-box-gray[ ### Definition Let `\((\Omega, \mathcal{F}, P)\)` be a probability space. Let `\(\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots\)` be a countable collection of sub `\(\sigma\)` algebras. The collection `\(\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots\)` is said to be independent under `\(P\)` if every finite sub-collection is independent under `\(P\)`. ] --- ### Example Consider the uniform probability distribution over `\([0,1]\)`, define `\(X_1, X_2, \ldots\)` by `$$X_n(\omega) = \operatorname{sign}\Big(\sin\big(2^{n+1} \pi \omega \big)\Big)$$` then `\(X_1, \ldots, X_n, \ldots\)` form a countable independent collection of random variables. --- class: center, middle, inverse ## Infinite product measures --- .content-box-gray[ ### Definition Cylindrical `\(\sigma\)`-algebra Let `\((\Omega_n, \mathcal{F}_n)_n\)` be a countable collection of measurable spaces, the cylinder `\(\sigma\)`-algebra is the `\(\sigma\)`-algebra of subsets of `\(\prod_{n=1}^\infty \Omega_n\)` that is generated by subsets of the form: `$$\prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \qquad\text{with } A_n \in \mathcal{F}_n \text{ for } n \leq m$$` where `\(m\)` is any integer. The subsets are called _finite-dimensional rectangles_ or _cylinders_. ] <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 576 512"><path d="M572.52 241.4C518.29 135.59 410.93 64 288 64S57.68 135.64 3.48 241.41a32.35 32.35 0 0 0 0 29.19C57.71 376.41 165.07 448 288 448s230.32-71.64 284.52-177.41a32.35 32.35 0 0 0 0-29.19zM288 400a144 144 0 1 1 144-144 143.93 143.93 0 0 1-144 144zm0-240a95.31 95.31 0 0 0-25.31 3.79 47.85 47.85 0 0 1-66.9 66.9A95.78 95.78 0 1 0 288 160z"/></svg>: cylinders form a `\(\pi\)`-class --- If each `\((\Omega_n, \mathcal{F}_n)\)` is endowed with a probability distribution, assigning a probability to cylinders looks straightforward: `$$\mathbb{P} \left( \prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \right) = \prod_{n=1}^m P_n(A_n) \times \prod_{n=m+1}^\infty P_n(\Omega_n) = \prod_{n=1}^m P_n(A_n)$$` The question is: > does `\(\mathbb{P}\)` extends to the cylinder `\(\sigma\)`-algebra? If an extension exists, is it unique? The answer is yes! 🍾 --- .content-box-gray[ ### Kolmogorov's extension theorem Let `\((\Omega_n, \mathcal{F}_n, P_n)_n\)` be a countable collection of probability spaces. Then there exists a unique probability distribution `\(\mathbb{P}\)` on the cylindrical `\(\sigma\)`-algebra that satisfy: `$$\mathbb{P} \left( \prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \right) = \prod_{n=1}^m P_n(A_n)$$` for every finite sequence `\(A_1, \ldots, A_m\)` in `\(\mathcal{F}_1 \times \ldots \times \mathcal{F}_m\)`. ] --- exclude: true class: center, middle, inverse ## Infinite independent collection of events --- exclude: true class: center, middle, inverse ## Infinite independent collection of `\(\sigma\)`-algebras --- class: center, middle, inverse ## Second Borel-Cantelli Lemma --- .content-box-gray[ ### Lemma Let `\(A_1, …, A_n, …\)` be a countable independent collection of events under `\((Ω, \mathcal{F}, P)\)`. If `\(∑_n P(A_n) = ∞\)` then `\(P \left(\cap_n \cup_{m\geq n} A_m\right) = 1\)` ] This is a partial converse to the first Borel-Cantelli Lemma --- #### Proof `$$P\left(\overline{\cap_n \cup_{m\geq n} A_m}\right) = P\left(\cup_n \overline{\cup_{m\geq n} A_m}\right) = P\left(\cup_n \cap_{m\geq n} \overline{A_m}\right)$$` For some `\(n\)`: `$$P\left(\cap_{m\geq n} \overline{A_m}\right) = P \left(\lim_{k \uparrow \infty} \cap_{n \leq m \leq k} \overline{A_m}\right) = \lim_{k \uparrow \infty} P \left(\cap_{n \leq m \leq k} \overline{A_m}\right)$$` `$$P \left(\cap_{n \leq m \leq k} \overline{A_m}\right) = \prod_{m=n}^k P\left( \overline{A_m} \right) = \prod_{m=n}^k \left( 1 - P\left( {A_m} \right)\right) \leq \mathrm{e}^{- ∑_{m=n}^k P(A_m)}$$` `$$\lim_{k ↑ ∞} \mathrm{e}^{- ∑_{m=n}^k P(A_m)} = 0$$` -- Hence: `$$P\left(\cap_{m\geq n} \overline{A_m}\right) = 0$$` `$$P\left(\overline{\cap_n \cup_{m\geq n} A_m}\right) \leq ∑_n P\left(\cap_{m\geq n} \overline{A_m}\right) =0$$` --- class: center, middle, inverse ## Absolutely continuous distributions --- ### Densities and absolute continuity Beyond discrete distributions, the simplest probability distributions are defined by a density function with respect to a ( `\(\sigma\)` -finite) measure This encompasses the distributions of the so-called _continuous random variables_. -- .content-box-gray[ ### Definition Absolute continuity Let `\(\mu, \nu\)` be two `\(\sigma\)`-additive measures on measurable space `\((\Omega, \mathcal{F})\)`, `\(\mu\)` is said to be _absolutely continuous_ with respect to `\(\nu\)` (denoted by `\(\mu \trianglelefteq \nu\)`) iff `\(\forall A \in \mathcal{F}\)` `\(\nu(A)=0 \Rightarrow \mu(A)=0\)`. ] -- If `\(\mu, \nu\)` are two probability distributions, and `\(\mu \trianglelefteq \nu\)`, then any event which is impossible under `\(\nu\)` is also impossible under `\(\mu\)`. --- ### - Is the counting measure on `\(\mathbb{R}\)` absolutely continuous with respect to Lebesgue measure? - Is the converse true? - Check that absolute continuity is a transitive relationship. --- ### Radon-Nikodym Theorem Let `\(\mu, \nu\)` be two `\(\sigma\)` -additive measures on measurable space `\((\Omega, \mathcal{F})\)` Assume `\(\nu\)` is `\(\sigma\)` -finite If `\(\mu \trianglelefteq \nu\)`, then there exists a measurable function `\(f\)` from `\(\Omega\)` to `\([0, \infty)\)` such that `$$\forall A \in \mathcal{F} \qquad \mu(A) = \int_A f(\omega) \mathrm{d}\nu(\omega) = \int_{\Omega} \mathbb{I}_A f \mathrm{d}\nu$$` --- - The function `\(f\)` is called _a version_ of the density of `\(\mu\)` with respect to `\(\nu\)` - The density is also called the Radon-Nikodym derivative of `\(\mu\)` with respect to `\(\nu\)` - The density is sometimes denoted by `\(\frac{\mathrm{d}\mu}{\mathrm{d}\nu}\)` --- <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M439.15 453.06L297.17 384l141.99-69.06c7.9-3.95 11.11-13.56 7.15-21.46L432 264.85c-3.95-7.9-13.56-11.11-21.47-7.16L224 348.41 37.47 257.69c-7.9-3.95-17.51-.75-21.47 7.16L1.69 293.48c-3.95 7.9-.75 17.51 7.15 21.46L150.83 384 8.85 453.06c-7.9 3.95-11.11 13.56-7.15 21.47l14.31 28.63c3.95 7.9 13.56 11.11 21.47 7.15L224 419.59l186.53 90.72c7.9 3.95 17.51.75 21.47-7.15l14.31-28.63c3.95-7.91.74-17.52-7.16-21.47zM150 237.28l-5.48 25.87c-2.67 12.62 5.42 24.85 16.45 24.85h126.08c11.03 0 19.12-12.23 16.45-24.85l-5.5-25.87c41.78-22.41 70-62.75 70-109.28C368 57.31 303.53 0 224 0S80 57.31 80 128c0 46.53 28.22 86.87 70 109.28zM280 112c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32zm-112 0c17.65 0 32 14.35 32 32s-14.35 32-32 32-32-14.35-32-32 14.35-32 32-32z"/></svg> The sigma-finiteness assumption is crucial. If we choose `\(\mu\)` as Lebesgue measure and `\(\nu\)` as the counting measure, `\(\nu\)` is not `\(\sigma\)`-finite, `\(\mu(A)>0\)` implies `\(\nu(A)=\infty\)` which we may consider as larger than `\(0\)`. Nevertheless, Lebesgue measure has no density with respect to the counting measure. --- .content-box-gray[ ### Chain rule If `\(\rho \trianglelefteq \mu \trianglelefteq \nu\)` , `\(f\)` is a density of `\(\rho\)` with respect to `\(\mu\)` while `\(g\)` is a density of `\(\mu\)` with respect to `\(\nu\)`, then `\(fg\)` is a density of `\(\rho\)` with respect to `\(\nu\)` ] --- ### Exponential distribution The exponential distribution shows up in several areas of probability and statistics - In _reliability theory_, its _memoryless property_ make it a borderline case. - In the theory of _point processes_, the exponential distribution is connected with _Poisson Point Processes_ - It is also important in _extreme value theory_ --- ### Definition The exponential distribution with intensity parameter `\(\lambda>0\)` is defined by its density with respect to Lebesgue measure on `\([0,\infty)\)` `$$x \mapsto \lambda \mathrm{e}^{-\lambda x}$$` The reciprocal of the intensity parameter is called the scale parameter. --- - Geometric and exponential distributions are connected: if `\(X\)` is exponentially distributed, then `\(\lceil X\rceil\)` is geometrically distributed. For `\(k\geq 1\)`: `$$P \Big\{ \lceil X \rceil \geq k \Big\} = P \Big\{ X > k - 1 \Big\} = \mathrm{e}^{- \lambda (k-1)}$$` - Check that `\(x \mapsto \lambda \mathrm{e}^{-\lambda x}\)` is a density probability over `\([0, \infty)\)`. - Compute the tail function and the cumulative distribution function of the exponential distribution function with parameter `\(\lambda\)`. --- - Let `\(X_1, \ldots, X_n\)` be i.i.d. exponentially distributed. Characterize the distribution of `\(\min(X_1, \ldots, X_n)\)`. - If `\(X\)` is exponentially distributed with scale parameter `\(\sigma\)`, what is the distribution of `\(a X\)`? --- exclude: true ### Exponential densities with different parameters: scales `\(1, 2, 1/2\)` or equivalently intensities `\(1, 1/2, 2\)`. Expectation equals scale, variance equals squared scale. ![(ref:witgetexponential)](cm-7-product-distributions_files/figure-html/witgetexponential-1.png) --- class: inverse, middle, center ## Gamma distributions --- - Sums of independent exponentially distributed random variables are not exponentially distributed <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm170.2 218.2C315.8 367.4 282.9 352 248 352s-67.8 15.4-90.2 42.2c-13.5 16.3-38.1-4.2-24.6-20.5C161.7 339.6 203.6 320 248 320s86.3 19.6 114.7 53.8c13.6 16.2-11 36.7-24.5 20.4z"/></svg> - The family of _Gamma distributions_ encompasses the family of exponential distributions - The family of _Gamma distributions_ with the same intensity parameter is stable under addition <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm80 168c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm-160 0c17.7 0 32 14.3 32 32s-14.3 32-32 32-32-14.3-32-32 14.3-32 32-32zm194.8 170.2C334.3 380.4 292.5 400 248 400s-86.3-19.6-114.8-53.8c-13.6-16.3 11-36.7 24.6-20.5 22.4 26.9 55.2 42.2 90.2 42.2s67.8-15.4 90.2-42.2c13.4-16.2 38.1 4.2 24.6 20.5z"/></svg> --- 💉 Euler's Gamma function: `$$\Gamma(t) = \int_0^\infty x^{t-1}\mathrm{e}^{-x} \mathrm{d}x \qquad \text{for } t>0$$` ### Definition The Gamma distribution with _shape_ parameter `\(p>0\)` and _intensity_ parameter `\(\lambda>0\)` is defined by its density with respect to Lebesgue measure on `\([0,\infty)\)`: `$$x \mapsto \lambda^p \frac{x^{p-1}}{\Gamma(p)} \mathrm{e}^{-\lambda x}$$` The reciprocal of the intensity parameter is called the _scale_ parameter. --- ### - Check that `\(x \mapsto \lambda^p \frac{x^{p-1}}{\Gamma(p)} \mathrm{e}^{-\lambda x}\)` is a density probability over `\([0, \infty)\)`. - If `\(X\)` is Gamma distributed with shape parameter `\(p\)` and scale parameter `\(\sigma\)`, what is the distribution of `\(a X\)`? --- ### Gamma densities with different parameters: Scales `\(1, 1, 1/3, 1, 2\)` and Shapes `\(1, 2, 3, 5, 5/2\)`. Expectation equals shape times scale Variance equals shape times squared scale --- ![(ref:witgetgamma)](cm-7-product-distributions_files/figure-html/witgetgamma-1.png) --- ### Univariate Gaussian distributions Gaussian distributions play a central role in Probability theory, Statistics, Information theory, and Analysis .content-box-gray[ ### Definition The Gaussian or normal distribution with mean `\(\mu \in \mathbb{R}\)` and variance `\(\sigma^2, \sigma>0\)` has density `$$x \mapsto \frac{1}{\sqrt{2 \pi} \sigma} \mathrm{e}^{- \frac{(x-\mu)^2}{2 \sigma^2}} \qquad\text{for } x \in \mathbb{R}$$` The standard Gaussian density is defined by `\(\mu=0, \sigma=1\)`. ] --- <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 576 512"><path d="M519.442 288.651c-41.519 0-59.5 31.593-82.058 31.593C377.409 320.244 432 144 432 144s-196.288 80-196.288-3.297c0-35.827 36.288-46.25 36.288-85.985C272 19.216 243.885 0 210.539 0c-34.654 0-66.366 18.891-66.366 56.346 0 41.364 31.711 59.277 31.711 81.75C175.885 207.719 0 166.758 0 166.758v333.237s178.635 41.047 178.635-28.662c0-22.473-40-40.107-40-81.471 0-37.456 29.25-56.346 63.577-56.346 33.673 0 61.788 19.216 61.788 54.717 0 39.735-36.288 50.158-36.288 85.985 0 60.803 129.675 25.73 181.23 25.73 0 0-34.725-120.101 25.827-120.101 35.962 0 46.423 36.152 86.308 36.152C556.712 416 576 387.99 576 354.443c0-34.199-18.962-65.792-56.558-65.792z"/></svg> - Check that `\(x \mapsto \frac{\mathrm{e}^{-x^2/2}}{\sqrt{2\pi}}\)` is a probability density over `\(\mathbb{R}\)`. - If `\(X\)` is distributed according to a standard Gaussian density, what is the distribution of `\(\mu + \sigma X\)`? - If `\(X\)` is distributed according to a standard Gaussian density, show that `$$\Pr \{ X > t \} \leq \frac{1}{t} \frac{\mathrm{e}^{-t^2/2}}{\sqrt{2\pi}} \qquad\text{for } t>0\,.$$` --- ### Gaussian densities parameters - The _location parameter_ `\(\mu\)` coincides with the mean and the median. - The _scale parameter_ `\(σ\)` is the standard deviation - The Inter-Quartile-Range (IQR) is proportional to the standard deviation. - If `\(\Phi^{\leftarrow}\)` denotes the quantile function of `\(\mathcal{N}(0,1)\)` then the interquartile range of `\(\mathcal{N}(\mu, \sigma^2)\)` is `\(\sigma \Big(\Phi^{\leftarrow}(3/4) - \Phi^{\leftarrow}(1/4)\Big)=2 \sigma \Phi^{\leftarrow}(3/4)\)`. --- ![(ref:witgetgauss)](cm-7-product-distributions_files/figure-html/witgetgauss-1.png) --- class: middle, center, inverse ## Computing the density of an image probability distribution --- ### Problem Assume - we know the (a) density `\(f\)` of the distribution of some vector valued random variable `\(X\)`, that `\(f\)` is positive on some open set `\(U\)`. - we have a _smooth_ function `\(\phi\)` that maps `\(U\)` to `\(V\)` <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zM88 224c0-24.3 13.7-45.2 33.6-56-.7 2.6-1.6 5.2-1.6 8 0 17.7 14.3 32 32 32s32-14.3 32-32c0-2.8-.9-5.4-1.6-8 19.9 10.8 33.6 31.7 33.6 56 0 35.3-28.7 64-64 64s-64-28.7-64-64zm224 176H184c-21.2 0-21.2-32 0-32h128c21.2 0 21.2 32 0 32zm32-112c-35.3 0-64-28.7-64-64 0-24.3 13.7-45.2 33.6-56-.7 2.6-1.6 5.2-1.6 8 0 17.7 14.3 32 32 32s32-14.3 32-32c0-2.8-.9-5.4-1.6-8 19.9 10.8 33.6 31.7 33.6 56 0 35.3-28.7 64-64 64z"/></svg> Does the distribution of `\(Y= \phi(X)\)` have a density? If yes, can we (easily) compute that density? --- 💉 Recall the _change of variable formula_ in elementary calculus. If `\(\phi\)` is monotone increasing and différentiable from open `\(A \subseteq \mathbb{R}\)` to `\(B\)` and `\(f\)` is Riemann integrable over `\(B\)`, then `$$\int_B f(y) \, \mathrm{d}y = \int_A f(\phi(x)) \, \phi^{\prime}(x) \, \mathrm{d}x$$` --- ### A multi-dimensional generalization of this elementary formula. This extension is then used to establish an off-the-shelf formula for computing the density of an image distribution -- Let us start with a uni-dimensional warm-up <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 320 512"><path d="M208 96c26.5 0 48-21.5 48-48S234.5 0 208 0s-48 21.5-48 48 21.5 48 48 48zm94.5 149.1l-23.3-11.8-9.7-29.4c-14.7-44.6-55.7-75.8-102.2-75.9-36-.1-55.9 10.1-93.3 25.2-21.6 8.7-39.3 25.2-49.7 46.2L17.6 213c-7.8 15.8-1.5 35 14.2 42.9 15.6 7.9 34.6 1.5 42.5-14.3L81 228c3.5-7 9.3-12.5 16.5-15.4l26.8-10.8-15.2 60.7c-5.2 20.8.4 42.9 14.9 58.8l59.9 65.4c7.2 7.9 12.3 17.4 14.9 27.7l18.3 73.3c4.3 17.1 21.7 27.6 38.8 23.3 17.1-4.3 27.6-21.7 23.3-38.8l-22.2-89c-2.6-10.3-7.7-19.9-14.9-27.7l-45.5-49.7 17.2-68.7 5.5 16.5c5.3 16.1 16.7 29.4 31.7 37l23.3 11.8c15.6 7.9 34.6 1.5 42.5-14.3 7.7-15.7 1.4-35.1-14.3-43zM73.6 385.8c-3.2 8.1-8 15.4-14.2 21.5l-50 50.1c-12.5 12.5-12.5 32.8 0 45.3s32.7 12.5 45.2 0l59.4-59.4c6.1-6.1 10.9-13.4 14.2-21.5l13.5-33.8c-55.3-60.3-38.7-41.8-47.4-53.7l-20.7 51.5z"/></svg> When starting from the uniform distribution on `\([0,1]\)` and applying a monotone differentiable transformation, the density of the image measure is easily computed. - Let `\(\phi\)` be differentiable and increasing on `\([0,1]\)`, and let `\(P\)` be the uniform distribution on `\([0,1]\)`. Check that `\(P \circ \phi^{-1}\)` has density `\(\frac{1}{\phi'\circ \phi^\leftarrow}\)` on `\(\phi([0,1])\)`. --- .content-box-gray[ ### Proposition If the real valued random variable `\(X\)` is distributed according to `\(P\)` with density `\(f\)`, and `\(\phi\)` is monotone increasing and differentiable over `\(\operatorname{supp}(P)\)`, then the probability distribution of `\(Y = \phi(X)\)` has density `$$g = \frac{f \circ \phi^{\leftarrow}}{\phi^{\prime}\circ \phi^{\leftarrow}}$$` over `\(\phi\big(\operatorname{supp}(P)\big)\)`. ] --- ### Proof By the fundamental theorem of calculus, the density `\(f\)` is a.e. the derivative of the cumulative distribution function `\(F\)` of `\(P\)`. The cumulative distribution function of `\(Y=\phi(X)\)` satisfies: `\begin{align*} P \Big\{ Y \leq y \Big\} & = P \Big\{ \phi(X) \leq y \Big\} \\ & = P \Big\{ X \leq \phi^{\leftarrow} (y) \Big\} \\ & = F \circ \phi^{\leftarrow}(y) \end{align*}` Almost everywhere, `\(F \circ \phi^{\leftarrow}\)` is differentiable, and has derivative `\(\frac{f \circ \phi^{\leftarrow}}{\phi' \circ \phi^{\leftarrow}}\)` in `\(\phi(\text{supp}(P))\)`, `\(0\)` elsewhere.and `$$P \Big\{ Y \leq y \Big\} = \int_{(-\infty, y] \cap \phi(\text{supp}(P))} \frac{f \circ \phi^{\leftarrow}(u)}{\phi' \circ \phi^{\leftarrow}(u)} \mathrm{d}u$$` --- .content-box-gray[ ### Corollary If the distribution of the real valued random variable `\(X\)` has density `\(f\)` then the distribution of `\(\sigma X + \mu\)` has density `\(\frac{1}{\sigma}f\Big(\frac{\cdot -\mu}{\sigma}\Big)\)` ] --- - In univariate calculus, it is easy to establish that if a function is continuous and increasing over an open set, it is invertible and its inverse is continuous and increasing - If the function is differentiable with positive derivative, its inverse is also differentiable - The differential and the differential of the inverse are related in transparent way --- The Global Inversion Theorem extends the preceding observation to the multivariate setting <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 416 512"><path d="M272 96c26.51 0 48-21.49 48-48S298.51 0 272 0s-48 21.49-48 48 21.49 48 48 48zM113.69 317.47l-14.8 34.52H32c-17.67 0-32 14.33-32 32s14.33 32 32 32h77.45c19.25 0 36.58-11.44 44.11-29.09l8.79-20.52-10.67-6.3c-17.32-10.23-30.06-25.37-37.99-42.61zM384 223.99h-44.03l-26.06-53.25c-12.5-25.55-35.45-44.23-61.78-50.94l-71.08-21.14c-28.3-6.8-57.77-.55-80.84 17.14l-39.67 30.41c-14.03 10.75-16.69 30.83-5.92 44.86s30.84 16.66 44.86 5.92l39.69-30.41c7.67-5.89 17.44-8 25.27-6.14l14.7 4.37-37.46 87.39c-12.62 29.48-1.31 64.01 26.3 80.31l84.98 50.17-27.47 87.73c-5.28 16.86 4.11 34.81 20.97 40.09 3.19 1 6.41 1.48 9.58 1.48 13.61 0 26.23-8.77 30.52-22.45l31.64-101.06c5.91-20.77-2.89-43.08-21.64-54.39l-61.24-36.14 31.31-78.28 20.27 41.43c8 16.34 24.92 26.89 43.11 26.89H384c17.67 0 32-14.33 32-32s-14.33-31.99-32-31.99z"/></svg> .content-box-gray[ ### Global Inversion Theorem Let `\(U\)` and `\(V\)` be two non-empty open subsets of `\(\mathbb{R}^d\)`. Let `\(\phi\)` be a continuous bijective from `\(U\)` to `\(V\)`. Assume furthermore that `\(\phi\)` is continuously differentiable, and that `\(D\phi_x\)` is non-singular at every `\(x \in U\)`. Then, the inverse function `\(\phi^{\leftarrow}\)` is also continuously differentiable on `\(V\)` and at every `\(y \in V\)`: `$$D\phi^{\leftarrow}_y = \Big(D\phi_{\phi^{\leftarrow}(y)} \Big)^{-1}$$` ] -- The Jacobian determinant of `\(\phi\)` is the determinant of the matrix that represents the differential. It is denoted by `\(J_\phi\)`. Recall that: `$$J_{\phi^{\leftarrow}}(y) = \Big(J_{\phi}(\phi^{\leftarrow}(y)) \Big)^{-1}$$` --- The multidimensional version of the change of variable formula is stated under the same assumptions as the Global Inversion Theorem. We admit the next Theorem. .content-box-gray[ ### Geometric change of variable formula Let `\(U\)` and `\(V\)` be two non-empty open subsets of `\(\mathbb{R}^d\)`. Let `\(\phi\)` be a continuous bijective from `\(U\)` to `\(V\)`. Assume furthermore that `\(\phi\)` is continuously differentiable, and that `\(D\phi_x\)` is non-singular at every `\(x \in U\)`. Let `\(\ell\)` denote the Lebesgue measure on `\(\mathbb{R}^d\)`. For any a non-negative Borel-measurable function `\(f\)`: `$$\int_U f(x) \mathrm{d}\ell(x) = \int f(\phi^\leftarrow(y)) \Big|J_{\phi^\leftarrow}(y) \Big| \mathrm{d}\ell(y)$$` ] -- Moving from cartesian coordinates to polar/spherical coordinates is easy thanks to an non-trivial application of the Geometricchange of variable formula --- The Image density formula is a corollary of the geometric change of variable formula. .content-box-gray[ ### Image density formula Let `\(P\)` have density `\(f\)` over open `\(U \subseteq \mathbb{R}^d\)`. Let `\(\phi\)` be bijective fron `\(U\)` to `\(\phi(U)\)` and `\(\phi\)` be continuously differentiable over `\(U\)` with non-singular differential. The density `\(g\)` of the image distribution `\(P \circ \phi^{-1}\)` over `\(\phi(U)\)` is given by `$$g(y) = f\big(\phi^\leftarrow(y)\big) \times \big|J_{\phi^\leftarrow}(y)\big| = f\big(\phi^\leftarrow(y)\big) \times \Big|J_{\phi}(\phi^\leftarrow(y))\Big|^{-1}$$` ] --- The proof of Image density formula from Geometric change of variable formula is a routine application of the transfer formula. Let `\(B\)` be a Borelian subset of `\(\phi(U)\)`. By the transfer formula: `\begin{align*} P\Big\{ Y \in B \Big\} & = P\Big\{ \phi(X) \in B \Big\} \\ & = \int_U \mathbb{I}_B(\phi(x)) f(x) \mathrm{d}\ell(x) \,. \end{align*}` -- Now, we invoke Geometric change of variable formula: `\begin{align*} \int_U \mathbb{I}_B(\phi(x)) f(x) \mathrm{d}\ell(x) & = \int_{\phi(U)} \mathbb{I}_B(\phi(\phi^\leftarrow(y))) f(\phi^\leftarrow(y)) \Big|J_{\phi^\leftarrow}(y)\Big| \mathrm{d}\ell(y) \\ & = \int_{\phi(U)} \mathbb{I}_B(y) f(\phi^\leftarrow(y)) \Big|J_{\phi^\leftarrow}(y)\Big| \mathrm{d}\ell(y) \, . \end{align*}` This suffices to conclude that `\(f\circ \phi^\leftarrow \Big|J_{\phi^\leftarrow}\Big|\)` is a version of the density of `\(P \circ \phi^{-1}\)` with respect to Lebesgue measure over `\(\phi(U)\)`. --- ## Application: Gamma-Beta calculus The image density formula is applied to show a remarkable connexion between Gamma and Beta distributions. .content-box-gray[ ### Proposition Let `\(X, Y\)` be independent random variables distributed according to `\(\Gamma(p, \lambda)\)` and `\(\Gamma(q, \lambda)\)` (the intensity parameter are _equal_). Let `\(U = X+Y\)` and `\(V= X/(X+Y)\)`. - `\(U \perp \!\!\! \perp V\)` - `\(U \sim \Gamma(p+q, \lambda)\)` - `\(V \sim \operatorname{Beta}(p, q)\)`. ] --- ### Proof The mapping `\(f: ]0, \infty)^2 \to ]0, \infty) \times ]0,1[\)` defined by `$$f(x,y) = \Big(x+y, \frac{x}{x+y} \Big)$$` is one-to-one with inverse `\(f^{\leftarrow}(u,v) = \Big(uv,u(1-v)\Big)\)`. -- The Jacobian matrix of `\(f^{\leftarrow}\)` at `\((u,v)\)` is `$$\begin{pmatrix} v & u \\ (1-v) & -u \end{pmatrix}$$` with determinant `\(-uv -u +uv=-u\)`. --- The joint image density at `\((u,v) \in ]0,\infty) \times ]0,1[\)` is `\begin{align*} & = \lambda^{p+q}\frac{(uv)^{p-1}}{\Gamma(p)} \frac{(u(1-v))^{q-1}}{\Gamma(q)} \mathrm{e}^{-\lambda (uv + u(1-v))} u \\ & = \Big(\lambda^{p+q} \frac{u^{p+q-1}}{\Gamma(p+q)} \mathrm{e}^{\lambda u}\Big) \times \Big(\frac{\Gamma(p+q)}{\Gamma(q)\Gamma(p)} v^{p-1} (1-v)^{q-1}\Big) \,. \end{align*}` --- The factorization of the joint density proves that `\(U \perp \!\!\! \perp V\)` We recognize that the density of (the distribution of) `\(U\)` is the Gamma density with shape parameter `\(p+q\)`, intensity parameter `\(\lambda\)`. The density of the distribution of `\(V\)` is the Beta density with parameters `\(p\)` and `\(q\)`. --- ### Assume `\(X_1, X_2, \ldots, X_n\)` form an independent family with each `\(X_i\)` distributed according to `\(\Gamma(p_i, \lambda)\)`. Determine the joint distribution of `$$\sum_{i=1}^n X_i, \frac{X_1}{\sum_{i=1}^n X_i}, \frac{X_2}{\sum_{i=1}^n X_i}, \ldots, \frac{X_{n-1}}{\sum_{i=1}^n X_i}$$` --- exclude: true class: middle, center, inverse # <svg style="height:0.8em;top:.04em;position:relative;fill:white;" viewBox="0 0 640 512"><path d="M192 384h192c53 0 96-43 96-96h32c70.6 0 128-57.4 128-128S582.6 32 512 32H120c-13.3 0-24 10.7-24 24v232c0 53 43 96 96 96zM512 96c35.3 0 64 28.7 64 64s-28.7 64-64 64h-32V96h32zm47.7 384H48.3c-47.6 0-61-64-36-64h583.3c25 0 11.8 64-35.9 64z"/></svg> --- class: middle, center, inverse background-image: url('./img/pexels-cottonbro-3171837.jpg') background-size: 112% # The End