16 Lecture 07
16.1 Four elemental confounds (continued)
16.1.1 Unobserved variables
Careful about unmeasured variables. They can create confounds, without being directly measured.
Eg. (Haunted DAG). G on C. G → P → C, G → C. But unobserved variable U creates a collider: G → P ← U → C. So including P allows the collider to distort the influence on G on C.
16.2 Over fitting
Ockham’s razor: “plurality should never be posited without necessity”
This isn’t sufficient, because we are usually comparing between models that are more complicated but fit the data better, and models that are less complicated but fit worse.
Two major hazards: too simple, not learning enough from data (under fitting) and too complex, learning too much from data (over fitting)
Goal = to learn from regular features from the sample, those that will generalize to other samples
16.3 Measuring model fit
16.3.1 R squared
Common, not great
\(R_{2} = 1 - \frac{var(residuals)}{var(outcome)}\)
“Proportion of variance explained”
You can get R squared = 1 with a parameter for each data point - perfect fit. This is obviously nonsense.
Therefore there’s a trap of picking models solely on their R squared because increase the parameters and you will increase the R squared.