Published

February 7, 2024

Linear Regression on Whiteside data

Packages installation and loading (again)

We will use the following packages. If needed, we install them.

Code
to_be_loaded <- c("tidyverse", 
                  "broom",
                  "magrittr",
                  "lobstr",
                  "ggforce",
#                  "cowplot",
                  "patchwork", 
                  "glue",
                  "DT", 
                  "viridis")

for (pck in to_be_loaded) {
  if (!require(pck, character.only = T)) {
    install.packages(pck, repos="http://cran.rstudio.com/")
    stopifnot(require(pck, character.only = T))
  }  
}

Dataset

Code
whiteside <- MASS::whiteside # no need to load the whole package

cur_dataset <- str_to_title(as.character(substitute(whiteside)))
 
# ?whiteside

Mr Derek Whiteside of the UK Building Research Station recorded the weekly gas consumption and average external temperature at his own house in south-east England for two heating seasons, one of 26 weeks before, and one of 30 weeks after cavity-wall insulation was installed. The object of the exercise was to assess the effect of the insulation on gas consumption.

Code
whiteside %>% 
  glimpse
Rows: 56
Columns: 3
$ Insul <fct> Before, Before, Before, Before, Before, Before, Before, Before, …
$ Temp  <dbl> -0.8, -0.7, 0.4, 2.5, 2.9, 3.2, 3.6, 3.9, 4.2, 4.3, 5.4, 6.0, 6.…
$ Gas   <dbl> 7.2, 6.9, 6.4, 6.0, 5.8, 5.8, 5.6, 4.7, 5.8, 5.2, 4.9, 4.9, 4.3,…

Start with columnwise and pairwise exploration

Code
C <- whiteside %>% 
  select(where(is.numeric)) %>% 
  cov()

# Covariance between Gas and Temp

mu_n <- whiteside %>% 
  select(where(is.numeric)) %>% 
  colMeans()

# mu_n # Mean vector

\[ C_n = \begin{bmatrix} 7.56 & -2.19\\ -2.19 & 1.36 \end{bmatrix} \qquad \mu_n = \begin{bmatrix} 4.88\\ 4.07 \end{bmatrix} \]

Use skimr::skim() to write univariate reports

Solution
Code
sk <- whiteside %>% 
  skimr::skim() %>% 
  select(-n_missing, - complete_rate)

skimr::yank(sk, "factor")

Variable type: factor

skim_variable ordered n_unique top_counts
Insul FALSE 2 Aft: 30, Bef: 26
Code
skimr::yank(sk, "numeric")

Variable type: numeric

skim_variable mean sd p0 p25 p50 p75 p100 hist
Temp 4.88 2.75 -0.8 3.05 4.90 7.12 10.2 ▃▅▇▇▃
Gas 4.07 1.17 1.3 3.50 3.95 4.62 7.2 ▁▆▇▂▁

Build a scatterplot of the Whiteside dataset

Solution
Code
p <- whiteside %>% 
  ggplot() +
  aes(x=Temp, y=Gas) +
  geom_point(aes(shape=Insul)) +
  xlab("Average Weekly Temperature Celsius") +
  ylab("Average Weekly Gas Consumption 1000 cube feet")

p + 
  ggtitle(glue("{cur_dataset} data"))

Build boxplots of Temp and Gas versus Insul

Solution
Code
q <- whiteside %>% 
  ggplot() +
  aes(x=Insul)

qt <- q + 
  geom_boxplot(aes(y=Temp))

qg <- q + 
  geom_boxplot(aes(y=Gas))

(qt + qg) +
  patchwork::plot_annotation(title = glue("{cur_dataset} data"))

Build violine plots of Temp and Gas versus Insul

Solution
Code
(q + 
  geom_violin(aes(y=Temp))) +
(q + 
  geom_violin(aes(y=Gas))) +
  patchwork::plot_annotation(title = glue("{cur_dataset} data"))