We start with Confidence Intervals in a simple Gaussian setting. We have \(X_1, \ldots, X_n \sim_{i.i.d.} \mathcal{N}(\mu, \sigma^2)\) where \(\mu\) and \(\sigma\) are unknown (to be estimated and/or tested).
The maximum likelihood estimator for \((\mu, \sigma^2)\) is \((\overline{X}_n, \widehat{\sigma}^2)\) where
\[\overline{X}_n =\sum_{i=1}^n \frac{1}{n} X_i\quad\text{and}\quad \widehat{\sigma}^2=\frac{1}{n}\sum_{i=1}^n (X_i - \overline{X}_n)^2\] By Student’s Theorem \(\overline{X}_n\) and \(\widehat{\sigma}^2\) are stochastically independent \(\overline{X}_n \sim \mathcal{N}(\mu, \widehat{\sigma}^2/n)\) and \(n \widehat{\sigma}^2/\sigma^2 \sim \chi^2_{n-1}\).
p<-X|>ggplot()+aes(x=stud)+geom_histogram(aes(y=after_stat(density)), bins =30, fill="white", color="black")+stat_function(fun=dt, args=c(df=n-1), linetype="dashed")+stat_function(fun=dnorm, linetype="dotted", color="blue")p+(p+scale_y_log10())+plot_annotation( title ="Histogram for Studentized discrepancy between true mean and estimate", subtitle =glue::glue("{N} replicates of Gaussian samples of size {n}"), caption =glue::glue("Dashed line is Student t density with {n-1} degrees of freedom\nDotted line is standard Gaussian density"))
Warning: Transformation introduced infinite values in continuous y-axis
The next function takes as arguments two vectors mu_hat and sig_hat and returns a dataframe where each row defines the bounds of a confidence interval whose width is computed using the optional arguments alpha (1-alpha is the targeted confidence level) and n (n is the common size of the samples used to compute the estimates mu_hat and sig_hat).
In data gathered from the 2000 General Social Survey (GSS), one cross classifies gender and political party identification. Respondents indicated whether they identified more strongly with the Democratic or Republican party or as Independents. This is summarized in the next contingency table (taken from Agresti Introduction to Categorical Data Analysis).
Turn the 3-way contingency table into a dataframe/tibble with columns Gender, Dept, Admit, n, where the first columns are categorical, and the last column counts the number of co-occurrences of the values in the first three columns amongst the UCB applicants.
We start from data summarized in table form and obtain data summarized in frequency form.