26 Lecture 17
26.1 Varying slopes
Slopes are another feature of the response
Making any parameter into a varying effect
- Split into vector of parameters by cluster
- Define population clusters
Any batch of parameters with exchangeable index values can (“and probably should”) be pooled. Exchangeable = unordered labels.
You could treat slopes as a distinct varying effect, but even better than that - relate intercepts to the slopes directly. Since intercepts and slopes are related in the population/math/geometry, features of these units have a correlation structure.
26.1.1 Example - cafes
Cafe visits in morning and afternoon, intercepts: average morning wait, slopes: avg difference between afternoon and morning.
Are the slopes and intercepts related? Yes. There is pooling across parameters.
The prior is a 2 dimensional Gaussian. There is a vector of means (average intercept, average slope) and a variance-covariance matrix.
26.3 Varying slopes model
\(W_{i} \sim \text{Normal}(\mu_{i}, \sigma)\)
\(\mu_{i} = \alpha_{\text{cafe}[i]} + \beta_{\text{cafe}[i}*A_{i}\)
[alpha cafe] ~ MVNormal([alpha / beta, S])
Mu i represents the varying intercepts + varying slopes. A i = afternoon/not
Multivariate prior: for each cafe, there’s a pair of parameters alpha and beta, distributed with a 2 dimensional normal with averages alpha and beta, and S the covariance matrix.
R ~ LKJcorr(2)
You can’t assign priors independently. 1 dimensional correlations vary between -1 and 1, and with increasing n dimensions, the correlation remains restricted within these limits. Therefore, if 1 is really big, the other is necessarily smaller.
The LKJcorr has one variable eta
. eta
defines how concentrated from the identity matrix. The density is between -1 and 1. eta = 1 represents a pretty much uniform density. eta > 1 has more concentration around 0, more skeptical of extremes.
26.4 Multidimensional shrinkage
Joint distribution of varying effects pools information across slopes and intercepts. Correlation induces shrinkages across dimensions, increasing accuracy.
26.4.2 Divergences
Because of divergences (which are more common in these models), we need to use the non-centered versions.
Simpler to do for uni variate models, since we need to factor all parameters out of the prior and into the linear model. How do we factor out a correlation matrix?
Cholesky factor