PSYC 3032 M
“He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may be cast.”
— Leonardo da Vinci, 1452-1519
Yes, this course is very applied, nonetheless, we’re not going to shy away from statistical theory to help us understand what is going on
At least, not unless I deem it unnecessary or generally less fruitful
The term model is used a lot in statistics and in the “real-world” (e.g., fashion model, model ship, etc.)
It’s important to realize that it means the same thing inside and outside statistics
A model is a simplified representation or abstraction of reality used to understand, predict, or explain phenomena in the real world
Models allow us to focus on key aspects of a system, process, object or problem while ignoring irrelevant details
In essence, models are valuable tools for learning and decision-making
Statistical models are very useful for understanding what’s going on in the world
\[Variable \ of \ interest = Model + Error\] or, better yet:
\[DV = IV1 + IV2 +IV3...+ Error\]
Example: If I think social media influence depression in kids, I might use social media to predict or explain depression to some extent:
\[Depression \ score = Hours \ spent \ on \ TikTok + Error\]
Statistical models are very useful for understanding what’s going on in the world. Here are THREE reasons we might want to use models in statistics:
\[CBT \rightarrow less \ anxiety\] (i.e., CBT causes less anxiety)
\[Model_{sample} \approx Model_{population}\]
This is really why we fit statistical models in practice: we want good, succinct descriptions of complex data after ruling out less probable competing models
There are more reasons to use models in statistics, but I’m giving you the most imactful ones, i.e., we are simplifying! 😉
And, remember…
“All models are wrong, but some are useful.”
— George Box, 1987
It’s one thing to make a model, but it’s another to know if it does a good job
Some models don’t explain much error (underfitting), while other models explain too much and don’t generalize to new cases (overfitting)
In this class we’ll learn to compare models and choose the most useful one
Which is the most useful model? 🧒️ + \(3\cdot\)🐻
Models in this course will almost always follow the same steps. Given some data:
You can see that statistical modeling is a dynamic process. It is more of a craft than exact science, and because it requires decision-making, it can be subjective and often easy to manipulate. Therefore, we should always use statistics responsibly and ethically, so we are true to the facts, and do not bend them to our needs (e.g., p < .05).
In this class we’ll use a particular type of model, the “umbrella” model called the general linear model (GLM)
The GLM is a mathematical framework for representing relationships between variables
The GLM is used to explain/predict variation in a particular outcome from 1 or more explanatory/predictor variables
Specifically, the GLM models linear relationships between a continuous dependent variable and any combination of categorical or continuous predictor variables
Examples:
Mathematically, the GLM can be expressed like this:
\[y_i = \beta_0 + \beta_1 x1_{i} + \beta_2 x2_{i} + \dots + \beta_p xp_{i} + \epsilon_i\]
where
Did you know?
z test, t test, ANOVA, correlation, and regression are all part of the GLM!!!
Even this course is a model! And, as with all models, there are assumptions. Here are mine:
That said, not everyone had the same path here. So next week, we’ll review (though, not in depth) some old topics, such as
Install or update R and RStudio!
Refresh your R/RStudio skills, see R Stuff section on eClass
Review syllabus and take the quiz on eClass
Fill out the short pre-class survey
Make sure you have iClicker account/app (and the location feature)
Try to knit the lab .rmd template as an HTML file
Relax, you’re going to be just fine! 😊
Module 1 (Part 1)