Trying to explain my 2017 Google Summer of Code project with pymc3 to people usually results in me being even more confused about what I’ll actually be doing this summer. I figured a blog post breaking it down would make things easier for everyone involved.
What usually leads to raised eyebrows is the name of the sampling algorithm I will be implementing – Riemannian Manifold Hamiltonian Monte Carlo. Those are a lot of big words, and not all of them might make sense even if you’ve dabbled with python or data science before. But it’s actually quite easy to understand, even if right now the only thing that comes to your mind when you hear of Monte Carlo is the casino over at Monaco.
Not quite the Monte Carlo we’ll be working on.
Let’s start by talking about what a Monte Carlo method is, and what sampling is. The wikipedia page on sampling actually does a good job explaining it; sampling can be understood as choosing a representative subset of the data or distribution we are dealing with. There are many different kinds of sampling algorithms – why do we concern ourselves with Monte Carlo ones?
This is simply because of how powerful these suite of algorithms are. We can sample from complex and possibly correlated distributions, and do it well. And how? The main idea is using randomness to solve problems that might be deterministic in principle- and it works well! In cases where there is a complex interaction of many variables — or when the inherently probabilistic nature of certain phenomena rules out a definitive prediction, we can try and estimate it by using essentially random inputs (within realistic limits) to model the system and produce probable outcomes.
Monte Carlo Markov Chains are used when sampling from probability distributions. We remember information about our previous step, and jump to a region of high probability, and hang around there for our sampling.
I could go on and explain exactly how this works but there are a lot of well written explanations for this, both mathematical and otherwise. I particularly like this blog post by pymc3 maintainer Thomas, and stack exchange also does a good job. It’s also fun to know why it was called as the Monte Carlo simulation method – back in the 1950s when scientists von Neumann and Ulam were working on it, the code name Monte Carlo was suggested by their colleague Metropolis, whose uncle would often gamble in the Monte Carlo casino we talked about before. It’s also a very important algorithm from a historic point of view, with it being used in the simulations for the Manhattan project responsible for the atomic bomb.
So now that the Monte Carlo bit is down, let’s try and see why Hamiltonian Mechanics fits into all this. When we use Monte Carlo Markov Chains, we jump from one state to another while doing our sampling – Hamiltonian MCs are when we go from successive sampled states by using a Hamiltonian evolution between states. But wait again – what is a Hamiltonian? It can be understood as a different way to understand classical mechanics, where we reformulate our equations so that we now deal with total energy of the system we are working with. And as for why, it is because properties of Hamiltonians allow us to reduce the random-walk behaviour of MCMC – the point of the Hamiltonian MC is to use a different kind of sampling that hopefully “explores” the distribution more efficiently. Instead of generating the next sample from the distribution by taking the current sample and taking a random step in some defined way, HMC lets us simulate Hamiltonian dynamics from the current sample.
One of my favourite intuitive explanations for HMC was from my mentor for this project, Colin Carrol. He says – “Metropolis sampling will take the previous point, and pick a new proposal nearby, usually using a normal distribution. This proposal is accepted or rejected based on some numerical bookkeeping. The problem is that in high dimensions, there is too much space, and we either reject every proposal, or take super tiny steps. Hamiltonian Monte Carlo solves this problem by inducing a vector field that our proposals can “flow” along, leading to distant proposals that are (almost) always accepted, at the price of computing gradient information.”
For more details, Alex Rogozhnikov’s blog post is great, and Micheal Betancourt’s paper on the subject is authoritative. For a less math heavy explanation, this works too.
We’re getting there now – only two more words to go!
Riemannian Manifolds – quite the mouthful, eh?
Again, the wiki does a good job in explaining what a manifold is – it’s a topological space that locally resembles Euclidean space. In other words, if you “zoom in” close enough, we’re dealing with an euclidean space. A nice example to understand this is our planet, Earth. Why did folks think the Earth was flat? I mean, it does look flat if we look around, but walk long enough in the same direction and you end up where you started! So our planet is a sphere, or a two-dimensional structure globally – but locally it appears to a two-dimensional plane. Or – Earth is a two-dimensional manifold.
So why is this relevant to Monte Carlo or Hamiltonians? By using Riemannian manifolds, we can use Reimannian metrics, which are differentiable.
Hamiltonian Monte Carlo also deal with manifolds in the sense of defining the potential energy of the state as a function on some manifold – what we intend to do is instead of using the standard Euclidean distances used before, make things funky by using Riemannian manifolds and metrics. These allow us to change around our equations allowing us to sample from further complex and correlated data. How would using a different metric change the way we deal with data?
Maybe think about the distance between two points on a torus. The Riemannian metric would measure a path on the surface of the torus, but the Euclidean metric would use a straight line. For points that are far apart, the length of the straight line might be very different from actual distance.
If you want to get more into the math and physics behind this, the reading material on my pymc3 GSoC page will let you know what’s up.
Let’s sum it up – Monte Carlo simulations lets us sample from probability distributions and data, Hamiltonians allow us to avoid some pitfalls of MCMC by assigning energy states and momentum, and Riemannian manifolds give HMCs even more power by using more advanced distance metrics. Phew!
Not so nasty after all, eh? The first bit I’m working on is implementing the SoftAbs metric, which seems like it should be a load of fun. I’ll be attempting to explain my summer work over a bunch of blog posts – and hope to see you around!