First, we need the following packages:

Required packages

# Load in libraries
library(tidyverse)
library(RColorBrewer)
library(MetBrewer)
library(plotly)
library(reshape2)

While we don’t actually need the raw data, it might be nice to have it and check. So, here it is entered manually.

Data

Scores from two tests taken by 22 students, mechanics and vectors.

student <- 1:22
m <- mechanics <- c(7, 44, 49, 59, 34, 46, 0, 32, 49, 52, 44, 36, 42, 5, 22, 18, 41, 48, 31, 42, 46, 63)
v <- vectors <- c(51, 69, 41, 70, 42, 40, 40, 45, 57, 64, 61, 59, 60, 30, 58, 51, 63, 38, 42, 69, 49, 63)

# Combine into a data frame
(data <- data.frame(Student = student, Mechanics = mechanics, Vectors = vectors))
# correlation coefficient 
(cor_mv <- cor(mechanics, vectors))
## [1] 0.4978075

Plotting the data and fitting both linear (red) and lowess (blue) regression lines.

p1 <- ggplot(data, aes(x=Mechanics, y=Vectors))+geom_point(size=3)+
  geom_smooth(method = "lm", colour="red", size=2, se=FALSE)+
  geom_smooth(method = "loess", colour="blue", size=2,  se=FALSE, span=0.85)+
  theme_minimal()

ggplotly(p1)

We can take Equation 3.10,\[\hat{\theta} = \frac{\sum_{i=1}^{22} (m_i - \bar{m})(v_i - \bar{v})}{\left[\sum_{i=1}^{22} (m_i - \bar{m})^2 \sum_{i=1}^{22} (v_i - \bar{v})^2\right]^{1/2}}\], and make it into an R function. The stats::cor(), of course, is sufficient, but just for fun…

Defining the Sample Correlation Function

# Define function
samp_cor <- function(m, v) {
  
  # Compute the means of m and v
  m_mean <- mean(m)
  v_mean <- mean(v)
  
  # Compute the numerator and denominator for the Pearson correlation formula
  numerator <- sum((m - m_mean) * (v - v_mean))
  denominator <- sqrt(sum((m - m_mean)^2) * sum((v - v_mean)^2))
  
  # Compute the sample correlation coefficient (theta_hat)
  numerator / denominator
  
}

# trying to see that it works
samp_cor(mechanics, vectors)
## [1] 0.4978075
all.equal(cor(mechanics, vectors), samp_cor(mechanics, vectors)) # Alright...
## [1] TRUE

Example 2 is great, but it skips over the stuff happening “under the hood.” Because it’s not high-dimensional (so no need for fancy MCMC), why don’t we try to do reproduce the example and results with a (mostly) “home-baked” code.

While the next part is not really Bayesian, it still might be fun to try to do it “by hand.”

Likelihood Function (and MLE “by hand”)

Efron and Hastie (Brad and Trev) chose a \(\sigma^2=1\) for the unidimensional normal density function in Eq 3.2 (p. 22), but I will use a smaller value, \(\sigma=.1\) (so that the graph later on would look nicer).

The following likelihood function does not actually need the data, only the estimate, \(\hat{\theta}\), so it doesn’t include the iterative multiplication for each observation and the constant, just the exponential piece. Here it is:

\[\exp\left(- \frac{(\hat{\theta} - \theta)^2}{2\sigma^2}\right)\]

And, as a function in R:

# Likelihood function for MLE with variance you can modify, 
likelihood <- function(theta, m, v, sigma = 0.01) {
  # Compute sample correlation coefficient using samp_cor
  theta_hat <- samp_cor(m, v)
  # Likelihood function complete 
  likelihood <- exp(- (theta_hat - theta)^2 / (2 * sigma^2))
  return(likelihood)
}

Before we try it out or continue, we should make some possible parameter values that theta can take, from -1 to 1:

\[\theta \in [-1, 1]\]

# possible values theta can take, from
theta <- seq(-1,1, length.out=1000)

OK, if we want to find the maximum-likelihood estimate (MLE), as in Figure 3.2 (p. 27), better to take the log of the function so that we don’t deal with very, very small values. Because the likelihood function was an \(e^{term}\), the log of it is then just the \(term\). That is, \[- \frac{(\hat{\theta} - \theta)^2}{2\sigma^2}\]

# Log-likelihood function for MLE
LL <- function(theta, m, v, sigma=0.01) {
  # Compute sample correlation coefficient using the samp_cor function
  theta_hat <- samp_cor(m, v)
  # Log-likelihood function: log of the likelihood. 
  log_likelihood <- -(theta_hat - theta)^2 / (2 * sigma^2)
  return(log_likelihood)
}

We can find the parameter value that produces the highest (log-)likelihood by optimization. I’m sure it can be done with DYI code, but I will just use the optimize function, if ya’ll don’t mind.

# find MLE with optimizations...
optimize(LL, interval = c(-1, 1), m = m, v = v, sigma = 0.01, maximum = TRUE) # Same as cor, surprise surprise ;)
## $maximum
## [1] 0.4978075
## 
## $objective
## [1] 0
optimize(likelihood, interval = c(-1, 1), m = m, v = v, sigma = 0.01, maximum = TRUE)
## $maximum
## [1] 0.4978122
## 
## $objective
## [1] 0.9999999

We can also plot this. Note, however, that this is not the same as in Figure 3.2 where it is the posterior from a flat prior. Here, we plot the log-likelihood function and the MLE location.

#Calculate log-likelihood for each theta
LL_values <- sapply(theta, function(theta) LL(theta, m, v))

# Find the MLE using the log-likelihood function
mle_res <- optimize(LL, interval = c(-1, 1), m = m, v = v, sigma = 0.1, maximum = TRUE)
(mle_theta <- mle_res$maximum)  # estimate of theta
## [1] 0.4978075
# Prepare data for plotting
plot_data <- data.frame(
  theta = theta,
  log_likelihood = LL_values
)

# Plot  log-likelihood using ggplot
mle_plot <- ggplot(plot_data, aes(x = theta)) +
  geom_line(aes(y = log_likelihood), color = "red", size = 2) +
  geom_vline(xintercept = mle_theta, color = "black", linetype = "dotted", size = 1.2) +
  labs(title = "Log-Likelihood function and MLE",
       x = "theta",
       y = "Value") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "none"
  )+annotate("text", x = mle_theta-0.2, y = -11000, label = paste("MLE =", round(mle_theta, 3)),
           hjust = -.95, vjust = 10, color = "black", size = 5)  # Adding MLE annotation

ggplotly(mle_plot)

We can go back to Example 2. Specifically, to Equation 3.11

Correlation Coefficient Density

Assuming that the joint \((m, v)\) distribution is bivariate normal – p. 26.

Side note, here’s a details of the note in p. 26:

“Formula (3.11) for the correlation coefficient density was R. A. Fisher’s debut contribution to the statistics literature.”

— Effron & Hastie

\[ f_{\theta}(\hat{\theta}) = \frac{(n - 2)(1 - \theta^2)^{(n - 1)/2} \left(1 - \hat{\theta}^2\right)^{(n - 4)/2}}{\pi} \int_0^{\infty} \frac{dw}{\left(\cosh(w) - \theta \hat{\theta}\right)^{n-1}} \]

Let’s define this as a function in R. But, I’ll try to break it down into smaller pieces, because it’s a lot…

We start with the right-hand-side of the density function, the integrand, the hyperbolic cosine that should be integrated.

The integrand uses hyperbolic cosine, and theta, the parameter of interest:

\[ \int_0^{\infty} \frac{dw}{\left(\cosh(w) - \theta \hat{\theta}\right)^{n-1}} \]

In R:

integrand <- function(w, theta, theta_hat, n) {
  return(1 / (cosh(w) - theta * theta_hat)^(n - 1))
}

Next, the numerator of the left-hand-side of the density function (the pre-calculus part…):

\[ (n - 2)(1 - \theta^2)^{(n - 1)/2} \left(1 - \hat{\theta}^2\right)^{(n - 4)/2} \]

non_calc_part <- function(theta, theta_hat, n) {
  left <- (n - 2) * (1 - theta^2)^((n - 1) / 2)  
  right <- (1 - theta_hat^2)^((n - 4) / 2)
  return(left*right)
}

Putting this all together for the full density function:

lik_dens <- function(theta, theta_hat, n) {
              left_side_res <- non_calc_part(theta, theta_hat, n)
  
              integral_result <- integrate(integrand, lower = -1, upper = 1, 
                               theta = theta, theta_hat = theta_hat, n = n)
  # The final likelihood is the product of the two parts, and the integral result, all divided by pi to normalize it according to the equation.
  return((left_side_res/pi) * integral_result$value)
}

Let’s plot the density function of \(\hat{\theta}\) as a function of \(\theta\):

theta_hat <- samp_cor(m,v)  # Correlation coefficient from the data
n <- length(m)  # Sample size

# Compute the likelihood for each theta value
likelihood_values <- sapply(theta, function(theta) {
  lik_dens(theta, theta_hat, n)
})

# Create a data frame for plotting
likelihood_df <- data.frame(
  theta = theta,
  Likelihood = likelihood_values
)


lik_dens_plot <- ggplot(likelihood_df, aes(x = theta, y = Likelihood)) +
  geom_line( size = 2, colour="red") +
   geom_vline(xintercept = mle_theta, color = "black", linetype = "dotted", size = 1.2) +
  labs(title = "Density Function of the Observed Sample Correlation", x = "theta", y = "Density") +
  theme_minimal()+annotate("text", x = mle_theta-0.2, y = 0.25, label = paste("MLE =", round(mle_theta, 3)),
           hjust = -.95, vjust = 10, color = "black", size = 5)  # Adding MLE annotation


ggplotly(lik_dens_plot)

Alright, now that we have the likelihood down, we can move on to priors.

Priors

These represent our prior beliefs or knowledge about theta, before seeing the data (even though we did calculate \(\hat{\theta}\) before, but let’s pretend we didn’t). The prior will update based on the new evidence (data) to form the posterior.

Let’s define the priors as specified in Chapter 3. But, for the sake of demonstration and comparison, I will add another prior of different “type,” not uninformative or weakly-informative. Instead, I’ll define an informative prior.

Uniform/Flat/Uninformative Prior

\[g(\theta)= 1/2, \ for -1\leq \theta \leq 1 \]

This is an uninformative prior that treats all values of \(\theta\) between -1 and 1 as equally likely.

As a function in R:

flat_prior <- function(theta) {
   ifelse(theta >= -1 & theta <= 1, 1/2, 0) # if theta is between -1 and 1, then 0.5, else 0
}  

Shrinkage/Triangular Prior

This is a prior that is less “ignorant” of the state of affairs, but still not very informative. It reflects a belief that \(\theta\) is likely to be close to 0 with decreasing probability as it moves toward -1 or 1.

In R:

shrink_prior <- function(theta) {
  ifelse(theta >= -1 & theta <= 1, 1 - abs(theta), 0)
}

Jeffery’s Prior

Again, not completely “ignorant,” but, in away, this prior is a little like the complement, or the opposite of Jeffery’s prior because it “penalizes” parameter values which are closer to 0. Note that at -1 and 1, this function is undefined.

\[g^{Jeff}(\theta)=\frac{1}{1-\theta^2}\]

jeff_prior <- function(theta) {
  ifelse(theta > -1 & theta < 1, 1 / (1 - theta^2), 0)
}

Finally, the added, informative prior.

Informative Prior

Say we have a good reason to believe that the correlation coefficient variable is close to \(\theta=0.8\)

# This is a normally distributed prior centred around 0.8 with a variance of 0.25^2.
informative_prior <- function(theta, sigma=0.25) {
  dnorm(theta, mean = 0.8, sd = sigma)  
}

Plotting the Priors

First, we compute the values of each prior across the range of \(\theta s\)

uniform_values <- sapply(theta, flat_prior)
jeffreys_values <- sapply(theta, jeff_prior)
shrinkage_values <- sapply(theta, shrink_prior)
informative_values <- sapply(theta, informative_prior)

And, plotting…

# Create a data frame to hold the prior values for plotting
prior_data <- data.frame(
  theta = rep(theta, 4),
  prior_value = c(uniform_values, jeffreys_values, shrinkage_values, informative_values),
  prior_type = factor(rep(c("Uniform/Flat", "Jeffrey's", "Shrinkage/Triangular", "Informative"), each = length(theta)))
)

# Plot the priors using ggplot
p2 <- ggplot(prior_data, aes(x = theta, y = prior_value, color = prior_type)) +
  geom_line(size = 2, alpha=.85) +
  labs(title = "Prior Types",
       x = "theta",
       y = "Prior Density",
       color = "Prior Type") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)
  )+ylim(0,4.5)+scale_color_manual(values=met.brewer("Peru1", 4))

ggplotly(p2)