16 Lecture 07 - 2019

16.1 Four elemental confounds (continued)

16.1.1 Unobserved variables

Careful about unmeasured variables. They can create confounds, without being directly measured.

Eg. (Haunted DAG). G on C. G → P → C, G → C. But unobserved variable U creates a collider: G → P ← U → C. So including P allows the collider to distort the influence on G on C.

16.2 Over fitting

Ockham’s razor: “plurality should never be posited without necessity”

This isn’t sufficient, because we are usually comparing between models that are more complicated but fit the data better, and models that are less complicated but fit worse.

Two major hazards: too simple, not learning enough from data (under fitting) and too complex, learning too much from data (over fitting)

Goal = to learn from regular features from the sample, those that will generalize to other samples

16.3 Measuring model fit

16.3.1 R squared

Common, not great

\(R_{2} = 1 - \frac{var(residuals)}{var(outcome)}\)

“Proportion of variance explained”

You can get R squared = 1 with a parameter for each data point - perfect fit. This is obviously nonsense.

Therefore there’s a trap of picking models solely on their R squared because increase the parameters and you will increase the R squared.

16.4 Obtaining the regular features

  • Regularizing priors
  • Cross validation
  • Information criteria