Student perspectives: ensemble modelling for probabilistic volcanic ash hazard forecasting

A post by Shannon Williams, PhD student on the Compass programme.

My PhD focuses on the application of statistical methods to volcanic hazard forecasting. This research is jointly supervised by Professor Jeremy Philips (School of Earth Sciences) and Professor Anthony Lee.

Introduction to probabilistic volcanic hazard forecasting

When an explosive volcanic eruption happens, large amounts of volcanic ash can be injected into the atmosphere and remain airborne for many hours to days from the onset of the eruption. Driven by atmospheric wind, this ash can travel up to thousands of kilometres downwind of the volcano and cause significant impacts on agriculture, infrastructure, and human health. Notably, a series of explosive eruptions from Eyjafjallajökull, Iceland between March and June 2010 resulted in the highest level of air travel disruption in Northern Europe since the Second World War, with the 108,000 flights cancelled and 1.7 billion USD in revenue lost.

The challenging task of forecasting volcanic ash dispersion is complicated by the chaotic nature of atmospheric wind fields and the lack of direct observations of ash concentration obtained during eruptive events.

Ash plume at the vent of Eyjafjallajökull, taken by Susanna Jenkins, Earth Observatory of Singapore.

Ensemble modelling

In order to construct probabilistic volcanic ash hazard assessments, volcanologists typically opt for an ensemble modelling approach in order to filter out the uncertainty in the meteorological conditions. This approach requires a model for the simulation of atmospheric ash dispersion, given eruption source conditions and meteorological inputs (referred to as forcing data). Typically a deterministic simulator such as FALL3D [1] is used, and so the uncertainty in the output is a direct consequence of the uncertainty in the inputs. To capture the uncertainty in the meteorological data, volcanologists construct an ensemble of simulations where the eruption conditions remain constant but the forcing data is altered, and average over the ensemble to produce an ash dispersion forecast [2].

Whilst this is a statistical approach by its construction and the objective is to provide probabilistic forecasts, predictions are often stated without providing a measure of the uncertainty in these predictions; furthermore, the ensemble size is typically decided on a pragmatic basis without consideration given to the magnitude of the probabilities of interest and the desired confidence in these probabilities. Much of my research since the beginning of my PhD has been on ways to enumerate these probabilities and their uncertainty, and on methods of deciding the ensemble size and reducing the uncertainty (variance) of the probability estimates.

Ash concentration exceedance probabilities

A common exercise in volcanic ash hazard assessment is the computation of ash concentration exceedance probabilities [3]. Given an eruption scenario, which we provide to the model by defining inputs such as the volcanic plume height and the mass eruption rate, what is the probability that the ash concentration at some location exceeds a pre-defined threshold? For example, how likely is it that the ash concentration at some flight level near Heathrow airport exceeds 500 \mu \text{g m}^{-3} for 24 hours or more? This is the sort of question that aviation authorities seek answers to, in order to ensure that flights can take off safely in the event of an eruption similar to that in 2010.

Exceedance probability estimation

To construct an ensemble for exceedance probability estimation, we draw n inputs Z_1, \dots, Z_n from a dataset of historical meteorological data. For each i=1, \dots, n, the output of the simulator is obtained by applying some function \phi to Z_i. Then, for the location of interest, the maximum ash concentration that persists for more than 24 hours is \psi \circ \phi (Z_i), the result of applying some other function \psi to \phi(Z_i). For a threshold c \, \mu \text{g m}^{-3} , we represent the event of exceedance by

X_i := \mathbb{I} \{\psi \circ \phi (Z_i) \geq c\},

where \mathbb{I} denotes the indicator function. X_i takes value 1 if the threshold is exceeded, and 0 otherwise. X_i is a Bernoulli random variable which depends implicitly on c, \phi and \psi, with success parameter

p := \mathbb{P}(X_i = 1) = \mathbb{P}(\psi \circ \phi (Z_i) \geq c),

representing the exceedance probability. As the X_i are independent and identically distributed random variables with mutual mean \mathbb{E}[X_i]=p and variance \text{Var}(X_i) = p (1-p), we estimate p via the sample mean of X_1, \dots, X_n,

\hat{p}_n := \sum_{i=1}^n X_i,

which provides an unbiased estimate of p with variance

\text{Var}(\hat{p}_n ) = \frac{1}{n} p (1-p).

Quantifying the uncertainty

This variance can be estimated by replacing p with \hat{p}_n and used to obtain confidence intervals for the value of p. Since the variance is inversely proportional to n, increasing the ensemble size will naturally reduce the variance and decrease the width of this confidence interval. For example, an approximate 95% confidence interval for p would be

\hat{p}_n \pm 1.96 \sqrt{\frac{1}{n} \hat{p}_n (1-\hat{p}_n)}.

The figure below compares 95% confidence intervals for the probability of exceeding 500 \mu \text{g m}^{-3} for 24 hours or more given a VEI 4 eruption from Iceland, for ensemble sizes of 50 and 100. The middle maps are the exceedance probability estimates, with the lower and upper ends of the confidence interval on the left and right, respectively.

Commonly we would want to visualise estimates of probabilities of different magnitudes, corresponding to different thresholds, alongside each other. In this situation it is beneficial to view them on a logarithmic scale and provide confidence intervals for \log p. Given \hat{p}_n, a 95% confidence interval for \log p is

\log (\hat{p}_n) \pm 1.96 \sqrt{\frac{1-\hat{p}_n}{n \hat{p}_n}}.

The figure below illustrates exceedance probability estimates with 95% confidence intervals versus threshold for a number of locations in Europe on a double logarithmic scale, based on an ensemble size of 6000.

Choosing the ensemble size

Clearly, as the ash concentration threshold increases the probability of exceeding that threshold gets smaller. Fewer exceedance events will be seen in the ensemble than for lower thresholds, and the size of the confidence interval will be larger when compared with the magnitude of the probability itself.

It seems sensible to set the ensemble size based on the magnitude of the lowest probability of interest, and a natural way to do this is to consider the expected number of exceedances given n:

\mathbb{E} \left[ \sum_{i=1}^n X_i \right] = \sum_{i=1}^n \mathbb{E} [X_i] = np.

Then, if we want to observe m or more exceedances, we should set n \geq m/p. If the lowest probability of interest is 10^{-4}, we need n \geq 10,000 to expect at least one exceedance event.

However, this approach does not guarantee that estimates of low-probability events will have low variances. Noting that the variance p(1-p)/n will be small for small p but large compared to p if n is small, we instead consider the relative variance,

\text{Var} \left( \frac{\hat{p}_n}{p} \right) = \frac{\text{Var} \left(\hat{p}_n\right)}{p^2} = \frac{1-p}{np},

which is the variance of the ratio of p and \hat{p}_n. Choosing n such that the relative variance is bounded above by 1 for the lowest value of p that is of interest thus ensures that the squared deviation from the mean of the estimate is no greater than the squared probability itself. So, for some \varepsilon, we would set (1-p) / np \leq \varepsilon to obtain a minimum sample size of

n \geq \frac{1-p}{\varepsilon p}.

Furthermore, since (1-p)/np is exactly the quantity in the square-root term of the confidence interval for \log p, this approach has the additional benefit of directly reducing the width of the confidence interval for \log p.

Further work

Further work I have carried out on this topic so far has been an investigation into alternative sampling schemes for reducing the computational cost of constructing an ensemble, as well as variance reduction methods.


[1] Folch, A., Costa, A., and Macedonio, G. FALL3D: A computational model for transport and deposition of volcanic ash. In: Computers and Geosciences 35,6, 2009, pp. 1334-1342.

[2] Bonadonna, C. et al. Probabilistic modeling of tephra dispersal: Hazard assessment of a multiphase rhyolitic eruption at Tarawera, New Zealand. In: Journal of Geophysical Research: Solid Earth 110.B3, 2005.

[3] Jenkins, S. et al. Regional ash fall hazard I: a probabilistic assessment methodology. In: Bulletin of volcanology 74.7, 2012, pp. 1699-1712.

Student Perspectives: An introduction to normalising flows

A post by Dan Ward, PhD student on the Compass programme.

Normalising flows are black-box approximators of continuous probability distributions, that can facilitate both efficient density evaluation and sampling. They function by learning a bijective transformation that maps between a complex target distribution and a simple distribution with matching dimension, such as a standard multivariate Gaussian distribution.

Transforming distributions

Before introducing normalising flows, it is useful to introduce the idea of transforming distributions more generally. Lets say we have two uniform random variables, u \sim \text{Uniform}(0, 1), and x \sim \text{Uniform}(0, 2). In this case, it is straight forward to define the bijective transformation T that maps between these two distributions, as shown below.

If we wished to sample p(x), but could not do so directly, we could instead sample u \sim p(u), and then apply the transformation x=T(u). If we wished to evaluate the density of p(x), but could not do so directly, we can rewrite p(x) in terms of p(u)

p(x)= p(u=T^{-1}(x)) \cdot 2^{-1},

where p(u=T^{-1}(x)) is the density of the corresponding point in the u space, and dividing by 2 accounts for the fact that the transformation T stretches the space by a factor of 2, “diluting” the probability mass. The key thing to notice here, is that we can describe the sampling and density evaluation operations of one distribution, p(x), based on a bijective transformation of another, potentially easier to work with distribution, p(u). (more…)

Student perspectives: Neural Point Processes for Statistical Seismology

A post by Sam Stockman, PhD student on the Compass programme.


Throughout my PhD I aim to bridge a gap between advances made in the machine learning community and the age-old problem of earthquake forecasting. In this cross-disciplinary work with Max Werner from the School of Earth Sciences and Dan Lawson from the School of Mathematics, I hope to create more powerful, efficient and robust models for forecasting, that can make earthquake prone areas safer for their inhabitants.

For years seismologists have sought to model the structure and dynamics of the earth in order to make predictions about earthquakes. They have mapped out the structure of fault lines and conducted experiments in the lab where they submit rock to great amounts of force in order to simulate plate tectonics on a small scale. Yet when trying to forecast earthquakes on a short time scale (that’s hours and days, not tens of years), these models based on the knowledge of the underlying physics are regularly outperformed by models that are statistically motivated. In statistical seismology we seek to make predictions through looking at distributions of the times, locations and magnitudes of earthquakes and use them to forecast the future.




Student Perspectives: Application of Density Ratio Estimation to Likelihood-Free problems

A post by Jack Simons, PhD student on the Compass programme.


I began my PhD with my supervisors, Dr Song Liu and Professor Mark Beaumont with the intention of combining their respective fields of research; Density Ratio Estimation (DRE), and Simulation Based Inference (SBI):

  • DRE is a rapidly growing paradigm in machine learning which (broadly) provides efficient methods of comparing densities without the need to compute each density individually. For a comprehensive yet accessible overview of DRE in Machine Learning see [1].
  • SBI is a group of methods which seek to solve Bayesian inference problems when the likelihood function is intractable. If you wish for a concise overview of the current work, as well as motivation then I recommend [2].

Last year we released a paper, Variational Likelihood-Free Gradient Descent [3] which combined these fields. This blog post seeks to condense, and make more accessible, the contents of the paper.

Motivation: Likelihood-Free Inference

Let’s begin by introducing likelihood-free inference. We wish to do inference on the posterior distribution of parameters \theta for a specific observation x=x_{\mathrm{obs}}, i.e. we wish to infer p(\theta|x_{\mathrm{obs}}) which can be decomposed via Bayes’ rule as

p(\theta|x_{\mathrm{obs}}) = \frac{p(x_{\mathrm{obs}}|\theta)p(\theta)}{\int p(x_{\mathrm{obs}}|\theta)p(\theta) \mathrm{d}\theta}.

The likelihood-free setting is that, additional to the usual intractability of the normalising constant in the denominator, the likelihood, p(x|\theta), is also intractable. In lieu of this, we require an implicit likelihood which describes the relation between data x and parameters \theta in the form of a forward model/simulator (hence simulation based inference!). (more…)

Student Perspectives: Contemporary Ideas in Statistical Philosophy

A post by Alessio Zakaria, PhD student on the Compass programme.


Probability theory is a branch of mathematics centred around the abstract manipulation and quantification of uncertainty and variability. It forms a basic unit of the theory and practice of statistics, enabling us to tame the complex nature of observable phenomena into meaningful information. It is through this reliance that the debate over the true (or more correct) underlying nature of probability theory has profound effects on how statisticians do their work. The current opposing sides of the debate in question are the Frequentists and the Bayesians. Frequentists believe that probability is intrinsically linked to the numeric regularity with which events occur, i.e. their frequency. Bayesians, however, believe that probability is an expression of someones degree of belief or confidence in a certain claim. In everyday parlance we use both of these concepts interchangeably: I estimate one in five of people have Covid; I was 50% confident that the football was coming home. It should be noted that the latter of the two is not a repeatable event per se. We cannot roll back time to check what the repeatable sequence would result in.


Skip to toolbar