19 Lecture 10

19.1 Markov Chain Monte Carlo

Reminder: Bayesian inference is about calculating the posterior. Bayesian ≠ Markov Chains

4 of the ways to compute the posterior

Advantages of MCMC

You don’t know the posterior yet you can still visit each part of it in proportion to it’s relative probability
“Sample from a distribution that we don’t know”

Converges in the long run, can be used as long as proposals are symmetric

Improvement on Metropolis, does not require the proposals to be symmetrical

More efficient version of MH

Markov Chain: No memory. Probability solely depends on current state, not past state. No storage.

Monte Carlo: Random simulation (eg Monaco casino)

MCMC is a numerical technique to solve for the posterior, with several advantages over Metropolis and Gibbs

Metropolis and Gibbs use optimization but optimization is not a good strategy in high dimensions (see concentration of measure)
Hamiltonian Monte Carlo uses a gradient to avoid the guess + check of Metropolis and Gibbs
Especially in high dimensional space, acceptance rate decreases and methods take more time

Hamiltonian Monte Carlo:

This is much more computationally intensive, but requires less steps, has much fewer rejections

It’s also easier to determine if MCMC has failed

Step size: time the simulation is run. Increase step size = increase efficiency but overestimates curvature

U Turn risk is solved by NUTS (No U Turn Sampler)

Warm up phase - finding the step size to maximize acceptance rate. Default = good (half the number of samples)
Runs in both directions and gives uncorrelated samples. No need to pick leap frog steps

Stan uses NUTS

Neff: number of effective samples. Can be greater than the number of samples from the Markov Chan. Effective if no autocorrelation

Rhat: Convergence diagnostic. 1 is good. Ratio of variance within vs ratio of variance across chains.

“Typically when you have a computational problem, often there’s a problem with your model”

TODO: p283