19 Lecture 10 - 2019

19.1 Markov Chain Monte Carlo

Reminder: Bayesian inference is about calculating the posterior. Bayesian ≠ Markov Chains

4 of the ways to compute the posterior

  1. Analytical approach (mostly impossible)
  2. Grid approximation (very intensive)
  3. Quadratic approximate (limited)
  4. MCMC (intensive)

Advantages of MCMC

  • You don’t know the posterior yet you can still visit each part of it in proportion to it’s relative probability
  • “Sample from a distribution that we don’t know”

19.1.1 Metropolis algorithm

  1. Loop over iterations
  2. Record location
  3. Generate neighbor location proposals
  4. Move based on frequency

Converges in the long run, can be used as long as proposals are symmetric

19.1.2 Metropolis Hastings

Improvement on Metropolis, does not require the proposals to be symmetrical

19.1.3 Gibbs sampling

More efficient version of MH

19.1.4 Hamiltonian Monte Carlo

Markov Chain: No memory. Probability solely depends on current state, not past state. No storage.

Monte Carlo: Random simulation (eg Monaco casino)

MCMC is a numerical technique to solve for the posterior, with several advantages over Metropolis and Gibbs

  • Metropolis and Gibbs use optimization but optimization is not a good strategy in high dimensions (see concentration of measure)
  • Hamiltonian Monte Carlo uses a gradient to avoid the guess + check of Metropolis and Gibbs
  • Especially in high dimensional space, acceptance rate decreases and methods take more time

Hamiltonian Monte Carlo:

  1. Uses a physics simulation representing the parameter state as a particle
  2. Flicks the particle around a friction less log-posterior surface
  3. Follows curvature of the surface, so it doesn’t get stuck
  4. Uses random direction and random speed
  5. Slows as it climbs, speeds as it drops

This is much more computationally intensive, but requires less steps, has much fewer rejections

It’s also easier to determine if MCMC has failed

19.1.5 Tuning MCMC

Step size: time the simulation is run. Increase step size = increase efficiency but overestimates curvature

U Turn risk is solved by NUTS (No U Turn Sampler)

  1. Warm up phase - finding the step size to maximize acceptance rate. Default = good (half the number of samples)
  2. Runs in both directions and gives uncorrelated samples. No need to pick leap frog steps

19.1.6 Stan

Stan uses NUTS

19.1.7 ulam

  1. Create list of data only what you need
  2. ulam with formulas as in quap
  3. ulam translates the formulas to Stan
  4. Builds the NUTS sampler
  5. Sampler runs
  6. Returns posterior

19.1.8 Diagnosis

Neff: number of effective samples. Can be greater than the number of samples from the Markov Chan. Effective if no autocorrelation

Rhat: Convergence diagnostic. 1 is good. Ratio of variance within vs ratio of variance across chains.

Typically when you have a computational problem, often there’s a problem with your model”

19.1.9 Checking the chain

TODO: p283