A post by Dan Ward, PhD student on the Compass programme.

Normalising flows are black-box approximators of continuous probability distributions, that can facilitate both efficient density evaluation and sampling. They function by learning a bijective transformation that maps between a complex target distribution and a simple distribution with matching dimension, such as a standard multivariate Gaussian distribution.

# Transforming distributions

Before introducing normalising flows, it is useful to introduce the idea of transforming distributions more generally. Lets say we have two uniform random variables, , and . In this case, it is straight forward to define the bijective transformation that maps between these two distributions, as shown below.

If we wished to sample , but could not do so directly, we could instead sample , and then apply the transformation . If we wished to evaluate the density of , but could not do so directly, we can rewrite in terms of

where is the density of the corresponding point in the space, and dividing by 2 accounts for the fact that the transformation stretches the space by a factor of 2, “diluting” the probability mass. The key thing to notice here, is that we can describe the sampling and density evaluation operations of one distribution, , based on a bijective transformation of another, potentially easier to work with distribution, . (more…)