14 Lecture 05
14.1 Multiple regression models
Why?
- Spurious associations
- Determining the value of some predictor given other predictors
- eg. divorce rate given marriage rate and median age at marriage. Once we know marriage rate, what is the value in knowing median age?
14.2 Directed acyclic graphs (DAG)
Directed: arrows, indicating causal implications
Acyclic: no loops
Unlike statistical models, DAGs have causal implications
eg. Median age → marriage rate → divorce rate, Median age → divorce rate
14.3 Example: Age, marriage, divorce
\(D_{i} \sim \text{Normal}(\mu_{i}, \sigma)\)
\(\mu_{i} = \alpha + \beta_{M}M_{i} + \beta_{A}A_{i}\)
(M)arriage rate
(A)ge at marriage
(D)ivorce rate
14.3.1 Priors
Standardize to z-scores
\(\alpha\) = expected value for response when all values are 0. since they are all standardized the response should be 0. Without peaking at the data, this could be hard to guess. But after standardization, it is much simpler.
Slopes - use prior predictive simulation. Harder.
14.3.3 Interpretation
Once we know median age at marriage, there is little additional value in knowing marriage rate.
Once we know marriage rate, there is still value in knowing median age at marriage.
If we don’t know median, it is still useful to know marriage rate, since median age at marriage is related to marriage rate. However, we don’t want to try and influence eg. policy on marriage rate, since it isn’t causal on divorce rate.
14.4 Plotting multivariate posteriors
- Regress predictor on other predictors
- Compute predictor residuals
- Regress outcome on residuals
Side note: never analyze the residuals.
14.5 Reveal masked associations
Sometimes association between outcome and predictor is masked by another variable
This tends to arise when 2 predictors associated with the outcome have opposite effects on it
14.6 Categorical variables
Two approaches:
- Use dummy/indicator variables
- Use index variables
Index variables are much better
14.6.1 Dummy variable
“Stand in” variable
Eg. male/female column, translated to 0, 1, 0, 0, 1 where 0 female, 1 male
Model:
\(h_{i} \sim \text{Normal}(\mu_{i}, \sigma)\)
\(\mu_{i} = \alpha + \beta_{M}M_{i}\)
In the case of dummy variables, alpha is the mean when M = 0 (female) and beta M is the change in mean when M = 1 (male).
Result is 2 intercepts = where alpha alone is for female and alpha + beta M is intercept for males
Problem: for k categories, need k-1 dummy variables and need priors for each. also, priors aren’t balanced because of alpha vs beta