15 Lecture 06 - 2019

15.1 Index variable

(For unordered, categorical variables)

Starts at 1, counts up

Same prior can be given to all

Extends easily > 2

eg.

m <- quap(
    alist(
        height ~ dnorm(mu, sigma),
        mu <- a[sex],
        a[sex] ~ dnorm(178, 20),
        sigma ~ dunif(0, 50)
    ), 
    data = d
)

a[sex] and a for each sex and prior for each. directly in precis(m) too.

Then you can directly calculate the difference between groups in the posteriors, no need to rerun the model

post <- extract.samples(m)
post$diff <- post$a[, 1] - post$a[, 2]
precis(post)

#       mean
# sigma 27
# a[1]  134
# a[2]  142
# diff  -7.7

15.2 Four elemental confounds

When inferring the relationships between X and Y…

Confounds are not determined by model selection, so we use DAGs.

Arrows indicate causation, and statistical information can flow either way.

15.2.1 Notes

  • Regression models don’t have arrows like DAGs - they just measure associations.
  • You can’t tell the difference between the fork and the path given the data alone.
  • Remember DAGs are small world constructs.

15.2.2 The fork

X ← Z → Y

Z is a common cause of X and Y. Including Z will remove the relationship between X and Y.

15.2.3 The path

X → Z → Y

Z is along the path of X and Y, mediating the relationship.

For example, the influence of treatment on plant height, where treatment has an influence on fungus.

T → F → H

Since the treatment influences the fungus (a post treatment measure), if we include both the treatment and the fungus, we will see no relationship of treatment on height, only fungus. (once we know fungus, what does treatment tell us - nothing). In this case, the model with both treatment and fungus tells us the relationship between them, but to properly consider the influence of treatment we need to omit fungus

Therefore, understanding the relationship between T and F is important, but for determining causality of T on H, we need to omit it from that model.

15.2.4 The collider

X → Z ← Y

Z is common result of X and Y. X and Y are independent, if you condition on Z. Careful about statistical correlations that do not indicate causation here.

15.2.5 Steps

  1. List all paths connecting X (treatment) and Y (outcome)
  2. Classify each path as either open or closed. All paths are open unless they contain a collider.
  3. Classify each path as backdoor/front door. Backdoor paths have an arrow entering X.
  4. Condition on variables in backdoor paths to close them.