Module 6: Mixing Categorical and Continuous Predictors

PSYC 3032 M

Udi Alter

Things Udi Should Remember



Before we start

  • Group presentation is due in two weeks (March 25)
    • Post videos on eClass
    • More instructions are now available
  • Assignment 1 grades are available, wonderful job, everybody!
  • The only evaluations remain are:
    • Group presentation
    • Lab 5 (easy peasy lemon squeezy)
    • A2

About Module 6

Goals for Today:

  • ANCOVA
    • Regression with a mix of categorical and continuous predictors
    • Parallelism: the assumption of homogeneity of slopes
  • Reviewing A1


What’s the first topping on your ideal pizza?



Do you have a lucky number? What is it?



Chocolate or vanilla?


A) Vanilla, duh!

B) Are you kidding me? Chocolate, what else?!

C) I like a mix (like ANCOVA!)




Mixing Categorical and Continuous Predictors

Working Example

  • Last week, we discussed an example where researchers (Baumann et al.) sought to determine how children’s reading comprehension scores after an intervention (i.e., posttest scores) differed by treatment group (control, DRTA, or TA)

Categorical Predictor Variables

  • We saw how we could use dummy coding to evaluate whether type of intervention is a meaningful explanatory variable of posttest scores
    • Where \(D1\) represents the mean difference between DRTA and control groups, and
    • \(D2\) represents the mean difference between TA and control groups
  • D1 and D2 then become the predictors in a multiple regression model (instead of the original grouping variable, e.g., group):

\[\hat{Reading \ score}_i = {\color{deeppink} {\beta_0}} + {\color{darkcyan} {\beta_1}}{\color{darkgrey} {D1_i}} + {\color{gold} {\beta_2}}{\color{lightblue} {D2_i}}\]

\[\hat{Reading \ score}_i = {\color{deeppink} {6.68}} + {\color{darkcyan} {3.09}}{\color{darkgrey} {D1_i}} + {\color{gold} {1.09}}{\color{lightblue} {D2_i}}\]

\[\hat{Reading \ score}_i = {\color{deeppink} {6.68}} + {\color{darkcyan} {3.09}}{\color{darkgrey} {(DRTA \ vs. \ Control)_i}} + {\color{gold} {1.09}}{\color{lightblue} {(TA \ vs. \ Control)_i}}\]

Regrssion with a Categorical Predictor is a One-Way ANOVA

\[\hat{Reading \ score}_i = {\color{deeppink} {\beta_0}} + {\color{darkcyan} {\beta_1}}{\color{darkgrey} {D1_i}} + {\color{gold} {\beta_2}}{\color{lightblue} {D2_i}} = \\ \hat{Reading \ score}_i = {\color{deeppink} {6.68}} + {\color{darkcyan} {3.09}}{\color{darkgrey} {D1_i}} + {\color{gold} {1.09}}{\color{lightblue} {D2_i}}\]

Say we wanted to find the reading comprehension mean of DRTA, how can we do it?


A) \(\beta_1 + \beta_2\)

B) \(D1+D2\)

C) \(\beta_0 + \beta_1\)

D) \(\beta_0 + \beta_2\)

E) “Jesus, take the wheel!”




What about including other predictors beyond a single categorical variable?

Adding More Predictors

  • If the dummy-coding approach is analogous to a one-way ANOVA, then an MLR model with at least one categorical predictor and at least one continuous predictor is analogous to an ANCOVA (Analysis of Covariance)!

  • So, an ANCOVA model is just a certain case of multiple regression model that includes both categorical and continuous predictors

  • Typically, ANCOVA is used for comparing group means on an outcome variable while controlling for some continuous variable

  • It’s common for researchers to use the word “covariate” when referring to the continuous variable in ANCOVA

    • What they often mean by that is a variable that they cannot manipulate and has little substantive or theoretical interest
    • But, as we learned, added variables in the model can be anything we want to condition on or control for statistically (e.g., confounders, forks)
    • i.e., you simply want to partial out the effects of the added variable in the analysis

Nice Meeting You, Ann Cova!

  • If ANOVA/regression with a categorical variable is commonly presented as a method for comparing group means, ANCOVA is often presented as a method for comparing adjusted means across groups (AKA conditional means)

  • The interpretation of the slope associated with the categorical predictor will change slightly, but you’re already familiar with this change!

  • This will not be different than the difference between any other predictor in SLR when you move to MLR

QUICK EXAMPLE

  • Say we estimate a model where life satisfaction is regressed on different types of meditation interventions (e.g., control, mindfulness, mantra-based, and gratitude-based), conditioning on socio-economic status (SES)

  • Then, we can interpret the dummy-code representing one of the comparisons between mediation types as the mean difference in life satisfaction between, say, mindfulness and control, adjusted for SES (or holding SES constant)

One More Thing, Ann

  • But, there’s this other thing…

  • ANCOVA has an additional critical assumption which is known as homogeneity of regression, or parallelism

  • Parallelism means that the relationship between the continuous variable, \(X\), and the outcome, \(Y\), is assumed to be constant, or homogeneous, across the levels of the categorical predictor

  • Put simply, the traditional ANCOVA model assumes that there is no interaction between the continuous and categorical variables

  • But, if we reframe ANCOVA as MLR, it’s easy to relax this assumption by including an interactions term between the continuous and the dummy-coded variables representing group membership (we will address interactions in Module 7)

  • Another way to think about the parallelism assumption from a regression framework is the assumption that your model is properly specified

    • i.e., you didn’t “miss” an interaction that exist in the population/data-generating mechanism

Parallelism Assumption

ANCOVA Example

  • Yes, the reading comprehension example, again!

  • Baumann et al. (1992) were actually interested in how the groups differed in their post-intervention reading test scores (post-test) over and above any differences on a reading test score administered before the intervention (pre-test)


  • This is an example of a classic “Pre/Post” research question for which ANCOVA is often applied, for example:
    • “How do intervention types differ on post-test reading scores, controlling pre-test scores?”
    • “Is there still a difference between the groups in their post-test reading scores after accounting for where each individual started (pre-test reading score)?”
  • Our regression/ANCOVA model will provide estimates of the adjusted mean differences, controlling for pre-test score!

ANCOVA Example

Let’s explore the pre-test scores…


Group Count Mean SD Min Max Skewness Kurtosis
control 22 10.500000 2.972092 4 16 -0.2181529 -0.6538804
DRTA 22 9.727273 2.693587 6 16 0.8073246 -0.4021767
TA 22 9.136364 3.342304 4 14 0.0036221 -1.5433129

Baumann et al. (1992) were actually interested in how the groups differed in their post-intervention reading test scores over and above any differences on a reading test score administered before the intervention


How can we evaluate their research question?


A) Hierarchical regression

B) WLS

C) Robust regression

D) There’s always room for pud

E) Multilevel modeling

ANCOVA Example

  • Last week, we examined group differences between post-test scores, but now we will add the pre-test score as a model covariate
    • We, therefore, would specify the multiple regression model:

\[\hat{Post}_i=\beta_0 + \beta_1D1_i + \beta_2D2_i + \beta_3Pre_i\]

  • Using this ANCOVA approach, the omnibus, overall effect of the intervention variable is the joint effect of D1 and D2, taken together

  • To obtain this joint effect and its statistical significance, we can follow a hierarchical regression procedure

ANCOVA Example

  • Specifically:


\(\text{Model 1}: \hat{Post}_i= {\color{deeppink} {\beta_0 + \beta_1Pre_i}}\)

vs.

\(\text{Model 2 (ANCOVA)}: \hat{Post}_i= {\color{deeppink} {\beta_0 + \beta_1Pre_i}} + \beta_2D1_i + \beta_3D2_i\)


read$group <- as_factor(read$group) # converting chr to factor
mod1 <- lm(posttest1 ~ pretest1, data = read)
mod2 <- lm(posttest1 ~ pretest1 + group, data = read)

ANCOVA Example

summary(mod1)$r.squared # Nested model R2
[1] 0.3202457
summary(mod2)$r.squared # Full model R2
[1] 0.5118617
summary(mod2)$r.squared-summary(mod1)$r.squared # Delta R2
[1] 0.191616

ANCOVA Example

And, for the actual model comparison, the F test:


anova(mod1, mod2)
Analysis of Variance Table

Model 1: posttest1 ~ pretest1
Model 2: posttest1 ~ pretest1 + group
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     64 508.88                                  
2     62 365.43  2    143.45 12.169 3.483e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANCOVA Example

  • This \(\Delta R^2 = .19\), almost 20% increase in explained variability in reading comprehension, a notable proportion indeed. This change in \(R^2\) is statistically significant, F (2, 62) = 12.17, p < .001, indicating that the reading intervention is both practically and statistically related to the post-test reading evaluation, over and above the pre-intervention reading score



  • The total proportion of variance in in outcome explained by the linear combination of the explanatory variables is .51, suggesting that 51.2% of of the entire variability in reading comprehension scores post-intervention is explained by intervention type and pre-test reading score; that is, more than half(!) of the variability in the reading comprehension scores post-intervention is accounted for by this set of variables, a truly substantial amount.

ANCOVA Example

  • Next, the estimated regression coefficients for D1 and D2 and their statistical significance give results for the specific, planned comparisons among the three treatment groups:
summary(mod2)$coefficients
              Estimate Std. Error    t value     Pr(>|t|)
(Intercept) -0.5966478  1.1845062 -0.5037101 6.162502e-01
pretest1     0.6931872  0.1014697  6.8314735 4.205253e-09
groupDRTA    3.6265538  0.7361861  4.9261371 6.553543e-06
groupTA      2.0361644  0.7449616  2.7332475 8.161831e-03
confint(mod2)
                 2.5 %    97.5 %
(Intercept) -2.9644420 1.7711465
pretest1     0.4903523 0.8960222
groupDRTA    2.1549387 5.0981689
groupTA      0.5470074 3.5253214

ANCOVA Example

  • Finally, the estimated regression coefficient for the last comparison between DRTA and TA and its statistical significance test results:


library(emmeans)
# Estimate the groups' marginal means
emm <- emmeans(mod2, ~ group)
# Pairwise comparisons for every unique pair
pairs(emm, adjust="none") # adjust= refers to multiplicity control (e.g., Tukey), but I don't know if I believe in MC
 contrast       estimate    SE df t.ratio p.value
 control - DRTA    -3.63 0.736 62  -4.926  <.0001
 control - TA      -2.04 0.745 62  -2.733  0.0082
 DRTA - TA          1.59 0.734 62   2.165  0.0342

ANCOVA Example

  • Pertaining to the effect of D1, the DRTA group had significantly higher post-test scores than the control group after adjusting for pre-intervention scores, \(\hat{\beta}_2= 3.64\), 95% CI [2.15, 5.10], t (62) = 4.92, p < .001; that is, partialling out the effect of pre-intervention reading ability, a random kid from the DRTA group is expected to score about 3.64 points higher than a kid in the control condition.
    • Given a 15-point score range in post-intervention, I consider a 3.5-point difference—which is roughly 24%—a small-to-medium effect, likely with light, yet non-negligible implications to improving reading comprehension.


  • Similarly, D2, the TA group had significantly higher post-test scores than the control group holding pre-test scores constant, \(\hat{\beta}_3= 2.04\), 95% CI [0.55, 3.52], t (62) = 2.73, p = 0.008; after accounting for the initial reading skills before the intervention, the mean difference between the TA and control groups is estimated at about 2 points (roughly 13.6% of the scale range), suggesting an even smaller difference than the DRTA group, but still may carry some potential real-world implications.

ANCOVA Example


  • Finally, for the comparison between the two interventions, the DRTA group had significantly higher post-test scores than the TA group holding pre-test scores constant, \(\hat{\beta}_{DRTA \ vs. \ TA}= 1.59\), 95% CI [0.12, 3.06], t (62) = 2.17, p = 0.034.
  • That is, when accounting for the initial reading skills before the intervention, the expected difference between the DRTA and TA groups is 1.6 points, suggesting that DRTA is a slightly better intervention by roughly 10%; a small, but likely still meaningful difference




What about Using Change Scores Instead of Conditioning on Pretest?

Change Scores


  • Another approach with pre-post designs, is to model the difference between post-test and pre-test scores such that the outcome variable is the difference betwe post and pre rather than posttest and control for pretest:


  • \(Y_{change} = Y_{post_i} – Y_{pre_i}\)
  • These are called change scores, difference scores, or gain scores

Change Scores Pros and Cons

Advantages of Using Change Scores:

  • Logical – Direct Measurement of Change: Change scores directly model the amount of change from pre-test to post-test, providing a clear and intuitive measure of individual differences over time.

  • Simplicity and Interpretability: Change scores are straightforward and easy to communicate, ideal for situations where stakeholders prefer clear, direct metrics of impact or change.

Limitations of Change Scores:

  • Potential for Bias: In non-randomized studies, change scores can introduce bias if baseline differences that influence outcomes are meaningful and not controlled for

  • Amplification of Measurement Error: Change scores can increase the impact of measurement error, especially if the measurement tools used at pre- and post-test are not highly reliable. This is due to their reliance on the accuracy of two separate measurements instead of one.

Change Scores vs. ANCOVA

  • Although ANCOVA and change scores represent different approaches to handling the same research design, researchers may actually arrive at different conclusions if they were to use both approaches on the same dataset; this phenomenon is called Lord’s Paradox (Lord, 1967)

  • Lord’s original experiment was meant to evaluate how young men and women differ on weight change over the course of a semester

  • But, men obviously start at a much higher average weight than women

  • In Lord’s dataset, men and women do not change at all over time (mean change = ~0 in both groups). Here’s a simulated illustration…

Lord’s Paradox


Lord’s Paradox

summary(lm(change ~ gender, dat))   # Change Scores (t-test)

Call:
lm(formula = change ~ gender, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5233  -3.3977   0.6239   3.9303  13.8364 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1.1692     0.5190   2.253   0.0254 *
genderMen     0.1209     0.7340   0.165   0.8694  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.19 on 198 degrees of freedom
Multiple R-squared:  0.0001369, Adjusted R-squared:  -0.004913 
F-statistic: 0.02711 on 1 and 198 DF,  p-value: 0.8694

Lord’s Paradox

summary(lm(final ~ gender + initial, dat)) # ANCOVA 

Call:
lm(formula = final ~ gender + initial, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.4746 -1.9902  0.1751  1.8293  8.0446 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.79411    1.67270   20.20   <2e-16 ***
genderMen   13.42333    0.79429   16.90   <2e-16 ***
initial      0.44538    0.02797   15.92   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.006 on 197 degrees of freedom
Multiple R-squared:  0.9463,    Adjusted R-squared:  0.9457 
F-statistic:  1734 on 2 and 197 DF,  p-value: < 2.2e-16

Lord’s Paradox

Results

  • Using t test on change score, we do not conclude a difference in between men and women

  • Using ANCOVA, we conclude a difference between men and women in final weight controlling for initial weight

  • So, which approach is correct?! Both approaches are correct!

The two approaches actually answer different research questions:

  • Change scores addresses the question, “What is the difference in weight change between men and women?”

  • ANCOVA answers, “Is there still a difference between men and women in their final weights, after accounting for where each individual started (initial weight)?”



  • Here’s a wonderful “tutorial” by Michael Clark showing more on Lord’s Paradox

Lord’s original experiment was meant to evaluate how young men and women differ on weight change over the course of a semester


Which do you think is more approapriate here?


A) t test on change scores

B) ANCOVA

C) Lisa Simpson’s paradox

D) HTML

E) Haven’t you done well




ANCOVA Assumptions

The assumptions of OLS regression apply equivalently to models with discrete predictors


What can we assume?


A) The linearity assumption is satisfied for the dummy codes

B) Homoscedasticity is the next step in human evolution (follwing homosapians)

C) Multicollinearity will never be violated

D) We don’t care about normality

E) Influential cases are relevant only when photosynthesis occurs

ACOVA Assumptions

  • Here’s how that looks
library(car)
mod2 <- lm(posttest1 ~ pretest1 + as_factor(group), data = read)
stud_resid <- rstudent(mod2) # Studentizing the model residuals
scatterplot(stud_resid ~ read$group, boxplot=FALSE)

  • See? I told you, didn’t I?

ACOVA Assumptions

  • The assumptions of OLS regression apply equivalently to models with discrete predictors and the same diagnostic procedures presented in earlier modules can be used

  • Recall, LINE:

    • Linear relationship (applies to covariate and outcome only)
    • Independence
    • Normally distributed residuals
    • Equal variance (homoscedasticity)
  • And, of course:

    • Multicollinearity
    • Influential cases/outliers


…But, ANCOVA has one more assumption, remember?


What’s the additional ANCOVA assumption?


A) Do no harm

B) homogeneity of regression/parallelism

C) Never judge a model by its sum of residuals

D) Confounders are never to be found(ers)

E) The continuous and categorical varibles must be correlated

Testing for Parallelism


  • We can add an interaction term between group and pre-test and test whether it fits the data better!


\(\hat{Post}_i= {\color{deeppink} {\beta_0}} + {\color{deeppink} {\beta_1D1_i}} + {\color{deeppink} {\beta_2D2_i}} + {\color{deeppink} {\beta_3Pre_i}}\) vs. \(\hat{Post}_i={\color{deeppink} {\beta_0}} + {\color{deeppink} {\beta_1D1_i}} + {\color{deeppink} {\beta_2D2_i}} + {\color{deeppink} {\beta_3Pre_i}} + \beta_4D1_i \times Pre_i + \beta_5D2_i \times Pre_i\)


  • Again, let’s do some hierarchical regression!


read$group <- haven::as_factor(read$group) # Ensure the group variable is treated as a factor

interaction_mod <- lm(posttest1 ~ pretest1 * group, data = read) # By using * we automatically add all the terms, beta1 through beta5 in the model above!

no_int_mod <- lm(posttest1 ~ pretest1 + group, data = read) # the same as mod2 from earlier

Testing for Parallelism


summary(interaction_mod)$r.squared
[1] 0.5351859


summary(no_int_mod)$r.squared
[1] 0.5118617

Testing for Parallelism

summary(interaction_mod)$r.squared-summary(no_int_mod)$r.squared # Delta R2
[1] 0.0233242


anova(interaction_mod, no_int_mod) # F test
Analysis of Variance Table

Model 1: posttest1 ~ pretest1 * group
Model 2: posttest1 ~ pretest1 + group
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     60 347.97                           
2     62 365.43 -2   -17.461 1.5054 0.2302

Testing for Parallelism

  • We can and should also plot it…
ggplot(read, aes(x = pretest1, y = posttest1, color = group)) +
  geom_smooth(method = "lm", se = TRUE, aes(fill = group), alpha = 0.25) +  # Add linear regression lines with semi-transparent confidence bands
  geom_point(size = 2, alpha = 0.6) +  # Plot the points with slight transparency
  labs(x = "Pretest Score", y = "Posttest Score",  title="Regression Slopes by Group") +
  theme(legend.position = "none") +
  theme_classic()




Assignment 1 Review

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Analytic Approach

“The researcher hypothesizes that these three focal predictors, as a set, will explain variability in career satisfaction above and beyond the effects of age and sex.”
PSYC3032M Assignment 1

  • Most appropriate to use hierarchical regression, why?

  • Because the research hypothesis was that work climate, respect, and influence predict career satisfaction over and above age and sex, so:

\(Model \ 1: \hat{Satisfaction}_i={\color{deeppink} {\beta_0}}+{\color{deeppink} {\beta_1Age}} + {\color{deeppink} {\beta_2Sex}}\)

vs.

\(Model \ 2: \hat{Satisfaction}_i={\color{deeppink} {\beta_0}}+{\color{deeppink} {\beta_1Age}} + {\color{deeppink} {\beta_2Sex}} + \beta_3Climate+\beta_4Respect+\beta_5Influence\)

Analytic Approach

“She also believes that each predictor will be meaningfully associated with career satisfaction when adjusting for the rest of the focal predictors as well as the covariates. Specifically, higher scores on each of the three predictors are expected to be associated with greater career satisfaction.”
PSYC3032M Assignment 1

  • For this hypothesis, we can just interpret the micro/predictor-level effects of the focal predictors in Model 2

  • For example, unstandardized regression coefficients (\(\hat{\beta} s\)), CIs, semipartial correlation squared (\(sr^2\)), significance tests, etc.

  • But, the phrasing “meaningfully associated” refers to the effect sizes, precision, and their practical implications, less the significance test, even though, that should also be taken into account

Analytic Approach

  • For example, for every 1-point difference on the respect scale, we expect a 0.25-point difference on career satisfaction. That is, if we randomly select two individuals who are the same on all other predictors but have a 1-point difference on respect, we’d expect the person who’s slightly higher on respect to have higher career satisfaction by about 0.25 points.


  • I consider 0.25 points on a possible range of 5 points, to be quite small, it’s roughly 5%…Is it meaningful? In my opinion, 0.25 points change, on its own, is probably not going to make a meaningful difference…(but wait for the results/discussion)


  • You can interpret the CI similarly…

Analytic Approach

“the researcher also wishes to explore potential differences in career satisfaction based on sex, hypothesizing that men may report higher career satisfaction than women.”
PSYC3032M Assignment 1

  • For this hypothesis, you can do any number of things…all are acceptable:
    • You can interpret the sex predictor in Model 2
    • You can do an SLR, regressing career satisfaction on sex alone; i.e., sex is the sole predictor
    • You can run a t test (virtually identical to the option above)
    • You could also address the sex predictor in Model 1, but I think it’s the least reasonable option
  • But, ideally, you can look at the regression coefficient for sex both in Model 2 and SLR / t test

Results/Discussion/Conclusions

  • I personally see each focal predictor, individually, contributing a small unique amount to the variability in satisfaction, and I doubt if their truly meaningful.
    • Of course, this is subjective, and other interpretations are equally valid!


  • What I see as the “mystery” in this data and research questions is that although each predictor, uniquely, contributes a small amount, as a set, the three focal variables explain a lot! (roughly 45%!!!).


  • The reason for this is because satisfaction, respect, climate, and influence all “overlap” (i.e., correlate strongly) with one another, perhaps this indicate a latent trait of sorts

Results/Discussion/Conclusions


Results/Discussion/Conclusions


  • So, the first hypothesis is certainly supported; respect, climate, and influence, as a set, explain a substantial proportion (close to 45%) of variability in career satisfaction over and above age and sex.


  • The second hypothesis, however, is only partially supported; yes, higher scores on each of the predictor variable is indeed associated with greater career satisfaction, but the unique effects of each predictor are small, and it’s questionable if they truly make a difference in practice.

Partial Regression Slopes


Partial Regression Slopes

  • As for potential sex differences…


  • Sure, male participants, on average, had higher career satisfaction.


  • But, this effect was very small, both in the SLR (\(\hat{\beta}_{sex}=0.22\)), and even more after adjusting for the focal predictors (\(\hat{\beta}_{sex \ adjusted}=-0.005\)).


  • So, even if the effect is statistically significant, I would not say the hypothesis is supported because the effect is unlikely to have any meaningful difference in real-world context.

Questions and Comments?