19 Lecture 10
19.1 Markov Chain Monte Carlo
Reminder: Bayesian inference is about calculating the posterior. Bayesian ≠ Markov Chains
4 of the ways to compute the posterior
- Analytical approach (mostly impossible)
- Grid approximation (very intensive)
- Quadratic approximate (limited)
- MCMC (intensive)
Advantages of MCMC
- You don’t know the posterior yet you can still visit each part of it in proportion to it’s relative probability
- “Sample from a distribution that we don’t know”
19.1.1 Metropolis algorithm
- Loop over iterations
- Record location
- Generate neighbor location proposals
- Move based on frequency
Converges in the long run, can be used as long as proposals are symmetric
19.1.2 Metropolis Hastings
Improvement on Metropolis, does not require the proposals to be symmetrical
19.1.4 Hamiltonian Monte Carlo
Markov Chain: No memory. Probability solely depends on current state, not past state. No storage.
Monte Carlo: Random simulation (eg Monaco casino)
MCMC is a numerical technique to solve for the posterior, with several advantages over Metropolis and Gibbs
- Metropolis and Gibbs use optimization but optimization is not a good strategy in high dimensions (see concentration of measure)
- Hamiltonian Monte Carlo uses a gradient to avoid the guess + check of Metropolis and Gibbs
- Especially in high dimensional space, acceptance rate decreases and methods take more time
Hamiltonian Monte Carlo:
- Uses a physics simulation representing the parameter state as a particle
- Flicks the particle around a friction less log-posterior surface
- Follows curvature of the surface, so it doesn’t get stuck
- Uses random direction and random speed
- Slows as it climbs, speeds as it drops
This is much more computationally intensive, but requires less steps, has much fewer rejections
It’s also easier to determine if MCMC has failed
19.1.5 Tuning MCMC
Step size: time the simulation is run. Increase step size = increase efficiency but overestimates curvature
U Turn risk is solved by NUTS (No U Turn Sampler)
- Warm up phase - finding the step size to maximize acceptance rate. Default = good (half the number of samples)
- Runs in both directions and gives uncorrelated samples. No need to pick leap frog steps
19.1.7 ulam
- Create list of data only what you need
-
ulam
with formulas as inquap
-
ulam
translates the formulas to Stan - Builds the NUTS sampler
- Sampler runs
- Returns posterior
19.1.8 Diagnosis
Neff: number of effective samples. Can be greater than the number of samples from the Markov Chan. Effective if no autocorrelation
Rhat: Convergence diagnostic. 1 is good. Ratio of variance within vs ratio of variance across chains.
“Typically when you have a computational problem, often there’s a problem with your model”