name: inter-slide class: left, middle, inverse {{ content }} --- name: layout-general layout: true class: left, middle <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: red; } </style>
--- template: inter-slide # Multilinear Regression II: Formulae and Algorithm(s) #### [Master I MIDS & EDA]() #### [Analyse Exploratoire de Données](http://stephane-v-boucheron.fr/courses/eda/) #### [Stéphane Boucheron](http://stephane-v-boucheron.fr) --- template: inter-slide ##
### Motivations ### Ordinary Least Squares ### Closed-form Formulae ### `lm` and `tidyverse` ### QR Factorization ### OLS and Pseudo-inversion --- template: inter-slide name: motivations ## Motivations ??? --- ### Principled description of OLS The approach to (multiple) linear regression we are following is called Ordinary Least Squares (OLS) Multiple Linear Regression consists of - Picking a dataframe with `n` rows, one _response_ variable `Y`, and _explanatory_ variables (also named _covariates_) - Use the (possibly categorical) covariates columns to build a _design matrix_ `Z` with `n` rows and `p` columns according to a _formula_ `Y ~ ...` - State the Least Square Problem : find `\(\beta \in \mathbb{R}^p\)` that minimizes `$$\left\| Y - Z \times \beta \right\|^2$$` Call the (possibly chosen) solution `\(\widehat{\beta}\)` - Run _diagnostics_ + check whether Good Of Fit criteria are trustable + spot outliers --- ### Convention - `\(\mathcal{M}_{n,p}\)` denotes the set of real matrices with `\(n\)` rows and `\(p\)` columns - Assumption `\(n > p\)` (classical regime) - Vectors are assumed to be column vectors (matrices with 1 column) - If `\(A\)` is a matrix, `\(A^T\)` is the transpose of `\(A\)` - The design matrix `\(Z\)` is in `\(\mathcal{M}_{n,p}\)` - The response vector `\(Y\)` is in `\(\mathbb{R}^n\)` (equivalently `\(\mathcal{M}_{n,1}\)`) - The parameter space is `\(\mathbb{R}^p\)` (equivalently `\(\mathcal{M}_{p,1}\)`) --- ### Ordinary Least Squares Ideally we would like to find `\(\beta \in \mathbb{R}^p\)` such that `$$Y = Z \times \beta$$` This system of linear equations is usually not solvable Instead we look for `\(\beta\)` that minimizes the Least Square criterion `$$\left\| Y - Z \times \beta \right\|^2$$` --- template: inter-slide name: closed-form-formulae ## Closed-form formulae --- ### Solving the OLS problem - Geometric approach - Analytic approach ??? --- ### Geometric approach - `\(\mathcal{L}(Z)\)` linear subspace of `\(\mathbb{R}^n\)` generated by the columns of `\(Z\)` - `\(\Pi\)` : `\(n \times n\)` matrix associated with orthogonal projection of `\(\mathbb{R}^n\)` on `\(\mathcal{L}(Z)\)` - `\(\widehat{Y} = \Pi \times Y\)` projection of `\(Y\)` on `\(\mathcal{L}(Z)\)` - `\(\widehat{\epsilon} = Y - \widehat{Y} = (\mathrm{Id} - \Pi)\times Y\)` projection of `\(Y\)` on on `\(\mathcal{L}(Z)^\bot\)` ??? Assume for a while that we already know how to compute `\(\widehat{Y}\)` --- ### Pythagorean formulae `$$\begin{array}{rl}\left\| Y - Z \beta \right\|^2 & = \left\| Y - \widehat{Y} + \left(\widehat{Y} - Z \beta\right) \right\|^2\\ & = \left\| Y - \widehat{Y}\right\|^2 + 2 \left\langle Y - \widehat{Y}, \widehat{Y} - Z \beta \right\rangle + \left\|\widehat{Y} - Z \beta \right\|^2\\ & = \left\| Y - \widehat{Y}\right\|^2 + \left\|\widehat{Y} - Z \beta \right\|^2\end{array}$$` as `$$Y - \widehat{Y} \in \mathcal{L}(Z)^\bot \quad\text{while}\quad \widehat{Y} - Z \beta \in \mathcal{L}(Z)$$` --- ### Exploiting the Pythagorean formula `$$\left\| Y - Z \beta \right\|^2 = \underbrace{\left\| Y - \widehat{Y}\right\|^2}_{\text{depends on }Y, Z} + \underbrace{\left\|\widehat{Y} - Z \beta \right\|^2}_{\text{to be optimized}}$$` OLS boils down to find some `\(\beta \in \mathbb{R}^p\)` that solves `$$\widehat{Y} = Z \beta$$` -- Two cases: - The columns of `\(Z\)` are linearly independent: `\(Z\)` has full column rank - The columns of `\(Z\)` are _not_ linearly independent --- ### Proposition .bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5[ Matrix `\(Z \in \mathcal{M}_{n,p}\)` has full column rank `\(p\)` iff `\(Z^T \times Z \in \mathcal{M}_{p,p}\)` is invertible ] --- ### Proof `\((\Rightarrow)\)` Let `\(Z\)` have full column rank. Assume `\(Z^T \times Z\)` is not invertible (proof by contradiction) -- Then there exists a _non-null_ vector `\(u \in \mathbb{R}^p\)` such that `\((Z^T \times Z) \times u = 0\)` with `\(0 \in \mathbb{R}^p\)` -- Thus `\(0 = \langle u , 0\rangle = u^T \times (Z^T \times Z) \times u = \langle Z\times u, Z \times u\rangle = \left\| Z \times u\right\|^2\)` which implies `\(Z \times u= 0\)` -- If `\(Z\)` has full-column rank, `\(Z \times u = 0\)` implies `\(u=0\)`, a contradiction --- ### Proof (continued) `\((\Leftarrow)\)` Let `\(Z \in \mathcal{M}_{n,p}\)` have column rank `\(< p\)` There exists `\(u \in \mathbb{R}^p \setminus \{0\}\)` such that `\(Z \times u = 0 \in \mathbb{R}^n\)` So `\(Z^T \times Z \times u = Z^T \times 0 = 0 \in \mathbb{R}^p\)` which implies that that `\(Z^T \times Z\)` is not invertible
--- ### OLS solution .bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5[ `$$\begin{array}{rl} \widehat{Y} = Z \times \beta & \Rightarrow Z^T \times \widehat{Y} = \left(Z^T \times Z\right) \times \beta\\ & \Leftrightarrow \left(Z^T \times Z\right)^{-1}\times Z^T \times \widehat{Y} = \beta \end{array}$$` `$$\widehat{\beta} = \left(Z^T \times Z\right)^{-1}\times Z^T \times \widehat{Y}$$` ] --- ### Computing the projection We keep assuming `\(Z\)` has full column rank ( `\(p\)` ) ### Definition: Hat matrix `$$H = Z \times \left(Z^T \times Z\right)^{-1} \times Z^T$$` `\(H \in \mathcal{M}_{n,n}\)` ??? --- ### Proposition .bg-light-gray.b--dark-gray.ba.br3.shadow-5.ph4.mt5[ Assumption: `\(Z\)` has full column rank The Hat matrix `\(H = Z \times \left(Z^T \times Z\right)^{-1} \times Z^T\)` coincides with the matrix `\(\Pi\)` associated with the orthogonal projection of `\(\mathbb{R}^n\)` on the linear subspace `\(\mathcal{L}(Z)\)` generated by the columns of `\(Z\)` ] --- ### Proof It is enough to check that 1. if `\(u \in \mathcal{L}(Z)^\bot\)` `\(H \times u =0\)` 1. if `\(u \in \mathcal{L}(Z) \subseteq \mathbb{R}^n\)` `\(H \times u =u\)` -- 1.) Assume `\(u \in \mathcal{L}(Z)^\bot\)`, then `\(u\)` is orthogonal to any column of `\(Z\)`, or equivalently to any row of `\(Z^T\)`, which implies `\(Z^T \times u = 0 \in \mathbb{R}^p\)` and `\(H \times u = Z \times \left(Z^T \times Z\right)^{-1} \times Z^T u =0\)` --- ### Proof (continued) 2.) Assume `\(u \in \mathcal{L}(Z)\)` Note that `\(H \times u\)` is a linear combination of the columns of `\(Z\)`, hence `\(v = H \times u \in \mathcal{L}(Z)\)` Observe `\(Z^T\times v = Z^T \times H \times u =Z \times \left(Z^T \times Z\right)^{-1} \times Z^T u = Z^T \times u\)` Hence `\(Z^T \times (v- u) = 0 \in \mathbb{R}^p\)` `\(v-u \in \mathcal{L}(Z)\)` and `\(v-u\)` is orthogonal to any column of `\(Z\)` (row of `\(Z^T\)` ) This shows `\(v - u = 0 \in \mathbb{R}^n\)`. Hence `\(H \times u = u\)` for `\(u \in \mathcal{L}(Z)\)`
--- ### Using the Hat matrix `$$\widehat{Y} = H \times Y = Z\times \left(Z^T \times Z\right)^{-1} \times Z^T \times Y= Z \times \widehat{\beta}$$` `$$\widehat{\epsilon} = Y - Z \times \widehat{\beta}$$` --- ### Analytic approach `$$\beta \to f(\beta) = \left\| Y - Z \times \beta \right\|^2$$` is a smooth convex function of `\(\beta\)` The smooth convex function `\(f\)` achieves its minimum where its gradient `\(\nabla f\)` vanishes `$$f(\beta) = \left\langle Y - Z \times \beta , Y - Z \times \beta \right\rangle= \left\| Y\right\| -2 \left\langle Z^T \times Y,\beta \right\rangle + \left\langle Z \times \beta, Z\times \beta\right\rangle$$` The gradient is `$$\nabla f = -2 Z^T \times Y + 2 \times Z^T \times Z\times \beta$$` The gradient vanishes for `$$\beta = \left( Z^T \times Z \right)^{-1} \times Z^T \times Y$$` --- template: inter-slide ## OLS and `lm` objects --- count: false ### Tidy Anscombe dataset .panel1-tidy_anscombe-auto[ ```r *anscombe <- datasets::anscombe ``` ] .panel2-tidy_anscombe-auto[ ] --- count: false ### Tidy Anscombe dataset .panel1-tidy_anscombe-auto[ ```r anscombe <- datasets::anscombe *anscombe ``` ] .panel2-tidy_anscombe-auto[ ``` ## x1 x2 x3 x4 y1 y2 y3 y4 ## 1 10 10 10 8 8.04 9.14 7.46 6.58 ## 2 8 8 8 8 6.95 8.14 6.77 5.76 ## 3 13 13 13 8 7.58 8.74 12.74 7.71 ## 4 9 9 9 8 8.81 8.77 7.11 8.84 ## 5 11 11 11 8 8.33 9.26 7.81 8.47 ## 6 14 14 14 8 9.96 8.10 8.84 7.04 ## 7 6 6 6 8 7.24 6.13 6.08 5.25 ## 8 4 4 4 19 4.26 3.10 5.39 12.50 ## 9 12 12 12 8 10.84 9.13 8.15 5.56 ## 10 7 7 7 8 4.82 7.26 6.42 7.91 ## 11 5 5 5 8 5.68 4.74 5.73 6.89 ``` ] --- count: false ### Tidy Anscombe dataset .panel1-tidy_anscombe-auto[ ```r anscombe <- datasets::anscombe anscombe %>% * tidyr::pivot_longer(everything(), * names_to = c(".value", "group"), * names_pattern = "(.)(.)" * ) ``` ] .panel2-tidy_anscombe-auto[ ``` ## # A tibble: 44 × 3 ## group x y ## <chr> <dbl> <dbl> ## 1 1 10 8.04 ## 2 2 10 9.14 ## 3 3 10 7.46 ## 4 4 8 6.58 ## 5 1 8 6.95 ## 6 2 8 8.14 ## 7 3 8 6.77 ## 8 4 8 5.76 ## 9 1 13 7.58 ## 10 2 13 8.74 ## # … with 34 more rows ``` ] --- count: false ### Tidy Anscombe dataset .panel1-tidy_anscombe-auto[ ```r anscombe <- datasets::anscombe anscombe %>% tidyr::pivot_longer(everything(), names_to = c(".value", "group"), names_pattern = "(.)(.)" ) %>% * rename(X=x, Y=y) ``` ] .panel2-tidy_anscombe-auto[ ``` ## # A tibble: 44 × 3 ## group X Y ## <chr> <dbl> <dbl> ## 1 1 10 8.04 ## 2 2 10 9.14 ## 3 3 10 7.46 ## 4 4 8 6.58 ## 5 1 8 6.95 ## 6 2 8 8.14 ## 7 3 8 6.77 ## 8 4 8 5.76 ## 9 1 13 7.58 ## 10 2 13 8.74 ## # … with 34 more rows ``` ] --- count: false ### Tidy Anscombe dataset .panel1-tidy_anscombe-auto[ ```r anscombe <- datasets::anscombe anscombe %>% tidyr::pivot_longer(everything(), names_to = c(".value", "group"), names_pattern = "(.)(.)" ) %>% rename(X=x, Y=y) %>% * arrange(group)-> anscombe ``` ] .panel2-tidy_anscombe-auto[ ] --- count: false ### Tidy Anscombe dataset .panel1-tidy_anscombe-auto[ ```r anscombe <- datasets::anscombe anscombe %>% tidyr::pivot_longer(everything(), names_to = c(".value", "group"), names_pattern = "(.)(.)" ) %>% rename(X=x, Y=y) %>% arrange(group)-> anscombe *anscombe ``` ] .panel2-tidy_anscombe-auto[ ``` ## # A tibble: 44 × 3 ## group X Y ## <chr> <dbl> <dbl> ## 1 1 10 8.04 ## 2 1 8 6.95 ## 3 1 13 7.58 ## 4 1 9 8.81 ## 5 1 11 8.33 ## 6 1 14 9.96 ## 7 1 6 7.24 ## 8 1 4 4.26 ## 9 1 12 10.8 ## 10 1 7 4.82 ## # … with 34 more rows ``` ] <style> .panel1-tidy_anscombe-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-tidy_anscombe-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-tidy_anscombe-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false ### First subset .panel1-lm_1_anscombe-auto[ ```r *anscombe ``` ] .panel2-lm_1_anscombe-auto[ ``` ## # A tibble: 44 × 3 ## group X Y ## <chr> <dbl> <dbl> ## 1 1 10 8.04 ## 2 1 8 6.95 ## 3 1 13 7.58 ## 4 1 9 8.81 ## 5 1 11 8.33 ## 6 1 14 9.96 ## 7 1 6 7.24 ## 8 1 4 4.26 ## 9 1 12 10.8 ## 10 1 7 4.82 ## # … with 34 more rows ``` ] --- count: false ### First subset .panel1-lm_1_anscombe-auto[ ```r anscombe %>% * filter(group=="1") ``` ] .panel2-lm_1_anscombe-auto[ ``` ## # A tibble: 11 × 3 ## group X Y ## <chr> <dbl> <dbl> ## 1 1 10 8.04 ## 2 1 8 6.95 ## 3 1 13 7.58 ## 4 1 9 8.81 ## 5 1 11 8.33 ## 6 1 14 9.96 ## 7 1 6 7.24 ## 8 1 4 4.26 ## 9 1 12 10.8 ## 10 1 7 4.82 ## 11 1 5 5.68 ``` ] --- count: false ### First subset .panel1-lm_1_anscombe-auto[ ```r anscombe %>% filter(group=="1") %>% * lm(Y ~ X, data=.) -> lm_1 ``` ] .panel2-lm_1_anscombe-auto[ ] --- count: false ### First subset .panel1-lm_1_anscombe-auto[ ```r anscombe %>% filter(group=="1") %>% lm(Y ~ X, data=.) -> lm_1 *lm_1 ``` ] .panel2-lm_1_anscombe-auto[ ``` ## ## Call: ## lm(formula = Y ~ X, data = .) ## ## Coefficients: ## (Intercept) X ## 3.0001 0.5001 ``` ] <style> .panel1-lm_1_anscombe-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-lm_1_anscombe-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-lm_1_anscombe-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ###
Brooming objects of class `lm` [Broom package](https://cran.r-project.org/web/packages/broom/vignettes/broom.html) .panelset[ .panel[.panel-name[`broom::tidy`] ```r lm_1 %>% broom::tidy() ``` ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 3.00 1.12 2.67 0.0257 ## 2 X 0.500 0.118 4.24 0.00217 ``` - Column `Estimate` contains estimated coefficients of `\(\widehat{\beta}\)` - Each row matches a column of design `\(Z\)` ( `\(Z\)` may differ from input dataframe) ] .panel[.panel-name[`broom::augment`] ```r lm_1 %>% broom::augment() %>% head() ``` ``` ## # A tibble: 6 × 8 ## Y X .fitted .resid .hat .sigma .cooksd .std.resid ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 8.04 10 8.00 0.0390 0.100 1.31 0.0000614 0.0332 ## 2 6.95 8 7.00 -0.0508 0.1 1.31 0.000104 -0.0433 ## 3 7.58 13 9.50 -1.92 0.236 1.06 0.489 -1.78 ## 4 8.81 9 7.50 1.31 0.0909 1.22 0.0616 1.11 ## 5 8.33 11 8.50 -0.171 0.127 1.31 0.00160 -0.148 ## 6 9.96 14 10.0 -0.0414 0.318 1.31 0.000383 -0.0405 ``` - column `.fitted` contains the projections `\(\widehat{Y}\)` - column `.fitted` contains the residuals - column `.hat` cointains the diagonal coefficients of the Hat matrix - `.cooksd` and `.std.resid` are diagnostic tools ] .panel[.panel-name[`broom::glance`] ```r lm_1 %>% broom::glance() ``` ``` ## # A tibble: 1 × 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.667 0.629 1.24 18.0 0.00217 1 -16.8 39.7 40.9 ## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` The only contains contains information pertaining to model selection Column `sigma` contains the standard error of the residuals : `\(\|\widehat{\epsilon}\|/\sqrt{n-p}\)`
pay attention to normalization `adj.r.squared` estimates the share of explained variance `AIC` and `BIC` are information criteria that help assessing the relevance of a model with respect to simpler models ] ] ??? > While model inputs usually require tidy inputs, such attention to detail doesn’t carry over to model outputs. Outputs such as predictions and estimated coefficients aren’t always tidy. This makes it more difficult to combine results from multiple models. For example, in R, the default representation of model coefficients is not tidy because it does not have an explicit variable that records the variable name for each estimate, they are instead recorded as row names. In R, row names must be unique, so combining coefficients from many models (e.g., from bootstrap resamples, or subgroups) requires workarounds to avoid losing important information. This knocks you out of the flow of analysis and makes it harder to combine the results from multiple models. I’m not currently aware of any packages that resolve this problem. --- template: inter-slide name: qr-factorization ## Algorithms: QR Factorization --- ### Two flavors 1. Direct methods 2. Iterative methods --- ### A naive attempt In order to compute the pseudo-inverse `\(Z^{+} = \left(Z^T \times Z\right)^{-1} \times Z^T\)`, we might use the following recipe: 1. Compute `\(Z^T \times Z\)`. This is just matrix transposition and matrix multiplication `t(Z) %*% Z`. 1. Compute the Cholesky decomposition of `\(L \times L^T = Z^T \times Z\)` (where `\(L\)` is lower-triangular and invertible if `\(Z^T \times T\)` is): `chol(t(Z) %*% Z)`. If invertible, `\(L\)` is easy to inverse. 1. Invert `\(L\)` and plug inverse in `\((Z^T \times Z)^{-1} = (L^{-1})^T \times L^{-1}\)` 1. `\((Z^T \times Z)^{-1} \times Z^T = (L^{-1})^T \times L^{-1} \times Z^T\)`. --- This is not what
does. Rather
relies on the so-called QR-factorization of `\(Z\)`: `$$Z = Q \times R$$` where - `\(Q \in \mathcal{M}_{n,p}\)` has pairwise orthogonal columns with unit norm ( `\(Q^T \times Q = \operatorname{Id}_p\)` ) and - `\(R\)` is upper-triangular with positive diagonal (assuming again that `\(Z\)` has full column rank). --- ### Using QR factorization From this factorization we readily obtain the Cholesky decomposition of `\(Z^T \times Z\)`: `$$Z^T \times Z = \big(R^T \times Q^T \big) \times \big(Q \times R) = \underbrace{R^T}_{\text{lower triagular}} \times \underbrace{R}_{\text{upper triangular}}$$` and `\((Z^T \times Z)^{-1} \times Z^T\)` reads as: `$$R^{-1} \times (R^{-1})^T \times R^T \times Q^T = R^{-1} \times Q^T$$` -- `$$H = Q \times R \times (R^T \times R)^{-1} \times R^T \times Q^T = Q\times Q^T$$` --- ### QR factorization on Anscombe dataset ```r anscombe_1 <- filter(anscombe, group=="1") %>% select(X,Y) Z <- cbind("I"=rep(1, 11), anscombe_1$X) %>% as.matrix() n <- nrow(Z); p <- ncol(Z) ``` ```r qr.Z <- qr(Z) Q.Z <- qr.Q(qr.Z) # Extraction of Q R.Z <- qr.R(qr.Z) # Extraction of R Ip <- diag(1, p, p) norm(t(Q.Z) %*% Q.Z - Ip) # Q has orthogonal column ``` ``` ## [1] 2.359224e-16 ``` ```r piv.Z <- solve(R.Z, Ip) %*% t(Q.Z) # pseudo-inverse, see later ``` --- ### Computing coefficients ```r piv.Z %*% as.matrix(anscombe_1$Y) ``` ``` ## [,1] ## I 3.0000909 ## 0.5000909 ``` ### Computing fitted values ```r H <- Q.Z %*% t(Q.Z) H %*% as.matrix(anscombe_1$Y) ``` ``` ## [,1] ## [1,] 8.001000 ## [2,] 7.000818 ## [3,] 9.501273 ## [4,] 7.500909 ## [5,] 8.501091 ## [6,] 10.001364 ## [7,] 6.000636 ## [8,] 5.000455 ## [9,] 9.001182 ## [10,] 6.500727 ## [11,] 5.500545 ``` ```r piv.Z %*% H %*% as.matrix(anscombe_1$Y) ``` ``` ## [,1] ## I 3.0000909 ## 0.5000909 ``` --- template: inter-slide ## OLS and rank-defficient designs --- ###
If the design does not have full column rank? This is equivalent to the fact that there are infinitely many solutions to the linear system `$$Z \times \beta = \widehat{Y}$$` We choose the solution with minimal Euclidean norm We solve the following problem: `$$\text{Minimize}\quad \left\| \beta \right\|^2 \quad\text{under constraint}\quad Z \times \beta = \widehat{Y}$$` We minimize a smooth convex function under linear constraint --- exclude: true ### Moore-Penrose pseudo-inverse of `\(Z\)` The pseudo-inverse `\(Z^{+}\)` is defined by `$$Z^{+} \times Z = \operatorname{Id}_p \qquad\text{and} \qquad Z \times Z^{+} = \Pi_{\mathcal{L}(Z)}$$` ??? Existence and unicity ? --- exclude: true ### QR factorization and pseudo-inversion --- exclude: true ### OLS (all in one) - Coefficients `$$\widehat{\beta} = Z^{+} \times Y$$` - Predictions `$$\widehat{Y} = Z \times Z^{+} \times Y$$` - Residuals `$$\widehat{\epsilon} = \left( \mathrm{Id} - Z \times Z^{+}\right) \times Y$$` --- exclude: true ### Regularized least squares criterion For each `\(\lambda>0\)`, the penalized least square costs is defined by `$$\Big\Vert Y - Z \beta \Big\Vert^2 + λ \big\|\beta\big\|^2$$` Gradient with respect to `\(\beta\)` `$$2\times\big(λ β + Z^T Z β - Z^T Y\big)$$` Unique optimum `$$\big(\lambda \operatorname{Id} + Z^TZ\big)^{-1} Z^T Y$$` --- class: middle, center, inverse background-image: url('./img/pexels-cottonbro-3171837.jpg') background-size: cover # The End