Published

February 16, 2024

Code
theme_set(theme_minimal())

Variable/Model selection and ANOVA on Whiteside data

Challenge(s)

Comparing weekly average temperatures over two seasons

We address the following question: was the external temperature distributed in the same way during the two heating seasons? When we raise this question, we silently make modeling assumptions. Spell them out.

What kind of hypothesis are we testing in the next two chunks? Interpret the results.

Code
lm_temp <- lm(Temp ~ Insul, whiteside)

lm_temp |> 
  tidy()
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    5.35      0.537      9.96 7.80e-14
2 InsulAfter    -0.887     0.734     -1.21 2.32e- 1
Code
lm_temp |> 
  glance()
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
1    0.0263       0.00830  2.74      1.46   0.232     1  -135.  276.  282.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Display parallel boxplots, overlayed cumulative distribution functions and a quantile-quantile plot (QQ-plot) to compare the temperature distributions during the two heating seasons. Comment

Perform a Wilcoxon test to assess change of Temperature between the two seasons

Does Insulation matter?

  • Does average Gas consumption change with Insulation?
  • Does Gas consumption dependence on Temperature change with Insulation?

As we have to infer the dependence on Temperature, the questions turn tricky.

Compare Gas consumption before and after (leaving Temperature aside)

Draw a qqplot to compare Gas consumptions before and after insulation.

Compare ECDFs of Gas consumption before and after insulation.

Do Insulation and Temperature additively matter?

This consists in assessing whether the Intercept is modified after Insulation while the slope is left unchanged. Which models should be used to assess this hypothesis?

Draw the disgnostic plots for this model

Do Insulation and Temperature matter and interact?

Find the formula and build the model.

Do Insulation and powers of temperature interact?

Investigate formulae Gas ~ poly(Temp, 2, raw=T)*Insul, Gas ~ poly(Temp, 2)*Insul, Gas ~ (Temp +I(Temp*2))*Insul, Gas ~ (Temp +I(Temp*2))| Insul

Higher degree polynomials

Play it with degree 10 polynomials

Drying model exploration

Collecting the models a posteriori

Make a named list with the models constructed so far

Use stepAIC() to perform stepwise exploration

ANOVA table(s)

Use fonction anova() to compare models constructed with formulae

Code
formula(lm0)
Gas ~ Insul + Temp

Wikipedia on Analysis of Variance