## Student perspectives: Compass Conference 2022

A post by Dominic Broadbent and Dom Owens, PhD students on the Compass CDT, and Compass conference co-organisers.

## Introduction

September saw the first annual Compass Conference, hosted in the newly refurbished Fry Building, home to the School of Mathematics. The conference was a fantastic opportunity for PhD students across Compass to showcase their research, meet with industrial partners and to celebrate their achievements. The event also welcomed the new cohort of PhD students, as well as prospective PhD students taking part in the Access to Data Science programme.

## Lightning talks and poster session

The day started with a series of 3-minute fast-paced lightning talks, where students pitched their research to attendees to come and visit them at the following poster session. More than 20 posters were on display, capturing the wide range of topics Compass students currently work on. The session gave everyone a chance to dive into the finer details and see results visualised in impressive and inventive ways. For academic and industry attendees, this also served as a forum to propose potential applications and collaborations.

The lightning talks were a great exercise in boiling down your work into just the essentials, knowing you only have a short time to try attract the audience’s attention and interest. While the poster session provided an opportunity to follow up on that interest with details and results, as well as the chance to to see what exactly everyone else in the office is working on!

## Research talks

After lunch, we saw 6 in-depth talks from students on late-stage or published research ideas, starting with Ettore Fincato’s work “Spectral analysis of the Gibbs sampler with the concept of conductance” and ending with Ed Davis’ “Universal Dynamic Network Embedding – How to Comprehend Changes in 20,000 Dimensions”. Our speakers took advantage of the longer format to give detailed, motivated presentations, concentrating on the most interesting mathematical details while still making sure their talks were accessible for everyone in the audience. Having a chaired discussion after each talk allowed for audience participation and gave more focus to the most interesting and challenging details. The format is intended to be similar to the talks students give at external conferences, giving the speakers a chance to perfect their technique in front of friendly faces.

## Special guest lecture

To cap off a successful event, the attendees moved to the great hall of the Wills Memorial Building which opened its doors to the public for a special guest lecture from John Burn-Murdoch, the Chief Data Officer at the Financial Times. John is a pioneering data journalist whose work came to particular prominence when he led the FT’s coverage of the Covid-19 pandemic. He also writes a weekly column, Data Points, where he leverages the power of data visualisation done well to inform his readers on topics such as economics, sport and health.

The lecture, titled Making charts that make an impact, was an exploration of what makes data visualisation effective as a means of communication, drawing on the latest scientific research and his experience of visualising the pandemic. It provided an excellent opportunity to learn how we can carefully present data to have the largest impact, and get across the ideas we want. Certainly, it sparked a discussion into the nature of data visualisation in mathematical research, what we can learn from the latest advances in data journalism, and when it may and may not be appropriate to tell a story.

## Thoughts from students

We surveyed the Compass students afterwards looking for feedback on their experiences. Here are a select few comments:

“Giving a talk on my work in an encouraging environment, with support from my friends and advisers, as well as a wide range of attendees with diverse technical knowledge, was an invaluable experience for me.”

“It’s really useful to learn how to communicate your research to people in industry. They tend to be less concerned with detail, and more concerned with the big idea, and how it can be used in practice. I’m looking forward to continuing some of the conversations I started while at the conference and seeing where they lead.”

“I had a very rewarding time at the Compass conference, the great mix of people, ideas, and catering made it so fun and interesting. Really looking forward to next year’s!”

“I found John’s talk fascinating, it was very interesting to peek behind the curtain and gain insight into the way a data journalist makes decisions on how to present data most effectively, from the minutiae to the big picture.”

## Student perspectives: ensemble modelling for probabilistic volcanic ash hazard forecasting

A post by Shannon Williams, PhD student on the Compass programme.

My PhD focuses on the application of statistical methods to volcanic hazard forecasting. This research is jointly supervised by Professor Jeremy Philips (School of Earth Sciences) and Professor Anthony Lee.

# Introduction to probabilistic volcanic hazard forecasting

When an explosive volcanic eruption happens, large amounts of volcanic ash can be injected into the atmosphere and remain airborne for many hours to days from the onset of the eruption. Driven by atmospheric wind, this ash can travel up to thousands of kilometres downwind of the volcano and cause significant impacts on agriculture, infrastructure, and human health. Notably, a series of explosive eruptions from Eyjafjallajökull, Iceland between March and June 2010 resulted in the highest level of air travel disruption in Northern Europe since the Second World War, with the 108,000 flights cancelled and 1.7 billion USD in revenue lost.

The challenging task of forecasting volcanic ash dispersion is complicated by the chaotic nature of atmospheric wind fields and the lack of direct observations of ash concentration obtained during eruptive events.

## Ensemble modelling

In order to construct probabilistic volcanic ash hazard assessments, volcanologists typically opt for an ensemble modelling approach in order to filter out the uncertainty in the meteorological conditions. This approach requires a model for the simulation of atmospheric ash dispersion, given eruption source conditions and meteorological inputs (referred to as forcing data). Typically a deterministic simulator such as FALL3D [1] is used, and so the uncertainty in the output is a direct consequence of the uncertainty in the inputs. To capture the uncertainty in the meteorological data, volcanologists construct an ensemble of simulations where the eruption conditions remain constant but the forcing data is altered, and average over the ensemble to produce an ash dispersion forecast [2].

Whilst this is a statistical approach by its construction and the objective is to provide probabilistic forecasts, predictions are often stated without providing a measure of the uncertainty in these predictions; furthermore, the ensemble size is typically decided on a pragmatic basis without consideration given to the magnitude of the probabilities of interest and the desired confidence in these probabilities. Much of my research since the beginning of my PhD has been on ways to enumerate these probabilities and their uncertainty, and on methods of deciding the ensemble size and reducing the uncertainty (variance) of the probability estimates.

# Ash concentration exceedance probabilities

A common exercise in volcanic ash hazard assessment is the computation of ash concentration exceedance probabilities [3]. Given an eruption scenario, which we provide to the model by defining inputs such as the volcanic plume height and the mass eruption rate, what is the probability that the ash concentration at some location exceeds a pre-defined threshold? For example, how likely is it that the ash concentration at some flight level near Heathrow airport exceeds 500 $\mu \text{g m}^{-3}$ for 24 hours or more? This is the sort of question that aviation authorities seek answers to, in order to ensure that flights can take off safely in the event of an eruption similar to that in 2010.

## Exceedance probability estimation

To construct an ensemble for exceedance probability estimation, we draw $n$ inputs $Z_1, \dots, Z_n$ from a dataset of historical meteorological data. For each $i=1, \dots, n$, the output of the simulator is obtained by applying some function $\phi$ to $Z_i$. Then, for the location of interest, the maximum ash concentration that persists for more than 24 hours is $\psi \circ \phi (Z_i)$, the result of applying some other function $\psi$ to $\phi(Z_i)$. For a threshold $c \, \mu \text{g m}^{-3}$ , we represent the event of exceedance by

$X_i := \mathbb{I} \{\psi \circ \phi (Z_i) \geq c\}$,

where $\mathbb{I}$ denotes the indicator function. $X_i$ takes value 1 if the threshold is exceeded, and 0 otherwise. $X_i$ is a Bernoulli random variable which depends implicitly on $c$, $\phi$ and $\psi$, with success parameter

$p := \mathbb{P}(X_i = 1) = \mathbb{P}(\psi \circ \phi (Z_i) \geq c)$,

representing the exceedance probability. As the $X_i$ are independent and identically distributed random variables with mutual mean $\mathbb{E}[X_i]=p$ and variance $\text{Var}(X_i) = p (1-p)$, we estimate $p$ via the sample mean of $X_1, \dots, X_n$,

$\hat{p}_n := \sum_{i=1}^n X_i$,

which provides an unbiased estimate of $p$ with variance

$\text{Var}(\hat{p}_n ) = \frac{1}{n} p (1-p)$.

## Quantifying the uncertainty

This variance can be estimated by replacing $p$ with $\hat{p}_n$ and used to obtain confidence intervals for the value of $p$. Since the variance is inversely proportional to $n$, increasing the ensemble size will naturally reduce the variance and decrease the width of this confidence interval. For example, an approximate 95% confidence interval for $p$ would be

$\hat{p}_n \pm 1.96 \sqrt{\frac{1}{n} \hat{p}_n (1-\hat{p}_n)}$.

The figure below compares 95% confidence intervals for the probability of exceeding 500 $\mu \text{g m}^{-3}$ for 24 hours or more given a VEI 4 eruption from Iceland, for ensemble sizes of 50 and 100. The middle maps are the exceedance probability estimates, with the lower and upper ends of the confidence interval on the left and right, respectively.

Commonly we would want to visualise estimates of probabilities of different magnitudes, corresponding to different thresholds, alongside each other. In this situation it is beneficial to view them on a logarithmic scale and provide confidence intervals for $\log p$. Given $\hat{p}_n$, a 95% confidence interval for $\log p$ is

$\log (\hat{p}_n) \pm 1.96 \sqrt{\frac{1-\hat{p}_n}{n \hat{p}_n}}$.

The figure below illustrates exceedance probability estimates with 95% confidence intervals versus threshold for a number of locations in Europe on a double logarithmic scale, based on an ensemble size of 6000.

## Choosing the ensemble size

Clearly, as the ash concentration threshold increases the probability of exceeding that threshold gets smaller. Fewer exceedance events will be seen in the ensemble than for lower thresholds, and the size of the confidence interval will be larger when compared with the magnitude of the probability itself.

It seems sensible to set the ensemble size based on the magnitude of the lowest probability of interest, and a natural way to do this is to consider the expected number of exceedances given $n$:

$\mathbb{E} \left[ \sum_{i=1}^n X_i \right] = \sum_{i=1}^n \mathbb{E} [X_i] = np$.

Then, if we want to observe $m$ or more exceedances, we should set $n \geq m/p$. If the lowest probability of interest is $10^{-4}$, we need $n \geq 10,000$ to expect at least one exceedance event.

However, this approach does not guarantee that estimates of low-probability events will have low variances. Noting that the variance $p(1-p)/n$ will be small for small $p$ but large compared to $p$ if $n$ is small, we instead consider the relative variance,

$\text{Var} \left( \frac{\hat{p}_n}{p} \right) = \frac{\text{Var} \left(\hat{p}_n\right)}{p^2} = \frac{1-p}{np}$,

which is the variance of the ratio of $p$ and $\hat{p}_n$. Choosing $n$ such that the relative variance is bounded above by 1 for the lowest value of $p$ that is of interest thus ensures that the squared deviation from the mean of the estimate is no greater than the squared probability itself. So, for some $\varepsilon$, we would set $(1-p) / np \leq \varepsilon$ to obtain a minimum sample size of

$n \geq \frac{1-p}{\varepsilon p}$.

Furthermore, since $(1-p)/np$ is exactly the quantity in the square-root term of the confidence interval for $\log p$, this approach has the additional benefit of directly reducing the width of the confidence interval for $\log p$.

# Further work

Further work I have carried out on this topic so far has been an investigation into alternative sampling schemes for reducing the computational cost of constructing an ensemble, as well as variance reduction methods.

# References

[1] Folch, A., Costa, A., and Macedonio, G. FALL3D: A computational model for transport and deposition of volcanic ash. In: Computers and Geosciences 35,6, 2009, pp. 1334-1342.

[2] Bonadonna, C. et al. Probabilistic modeling of tephra dispersal: Hazard assessment of a multiphase rhyolitic eruption at Tarawera, New Zealand. In: Journal of Geophysical Research: Solid Earth 110.B3, 2005.

[3] Jenkins, S. et al. Regional ash fall hazard I: a probabilistic assessment methodology. In: Bulletin of volcanology 74.7, 2012, pp. 1699-1712.

## Cohort 3 research projects confirmed

Our third Cohort of Compass students have confirmed their PhD projects for the next 3 years and are establishing the direction of their own research within the CDT.

Supervised by the Institute for Statistical Science:

Ettore Fincato is working on the project: Functional Analysis of Markov Chain Monte Carlo Algorithms with supervisors Christophe Andrieu and Mathieu Gerber

Dominic Broadbent is working on the project: Data Reduction and Large-Scale Inference – Bayesian Coresets with supervisors Nick Whiteley and Robert Allison. This by project is co-funded and co-supervised the National Cyber Security Centre.

Ben Griffiths is working on the project: Faster model fitting for quantile additive models with supervisor Matteo Fasiolo.  This project is co-funded and co-supervised by EDF Energy

Josh Givens is working on the project: Differential Model Inference with Imperfect Information with supervisors Song Liu and  Henry Reeve.

Tennessee Hickling is working on the project: Flexible Tails for Normalising Flows with supervisor Dennis Prangle

Hannah Sansford is working on the project: Some new results in graph representation learning with supervisors Nick Whitely and  Patrick Rubin-Delanchy.

The following projects are supervised in collaboration with the Institute for Statistical Science (IfSS) and our other internal partners at the University of Bristol:

Emerald Dilworth is working on the project:  Using Web Data and Network Science to Detect Spatial Relationships with supervisors Emmanouil Tranos (Geographical Sciences) and Dan Lawson

Ed Milsom is working on the project:  Deep kernel machines with supervisors Laurence Aitchison (Computer Science) and Song Liu

Harry Tata is working on the project: Novel semi-supervised Bayesian learning to rapidly screen new oligonucleotide drugs for impurities with supervisor Andrew Dowsey (Population Health Sciences).  This project is funded and co-supervised by Astra Zeneca

Daniel Milner is working on the project: Spatial and Temporal Assessment of Vulnerability and Resilience in Semi-Arid Landscapes with supervisors Andrew Dowsey (Population Health Sciences), Levi Wolf (Geographical Sciences)  and Kate Robson Brown (Engineering Mathematics). This project is co-funded and co-supervised by International Livestock Research Institute

## Compass Conference 2022

Our first Compass Conference was held on Tuesday 13th September 2022, hosted in the newly refurbished Fry Building, home to the School of Mathematics.

The conference was a celebratory showcase of the achievements of our students, supervisory teams, and collaborations with industrial partners. Attendees were invited from a diverse range of organisations outside of academia as well as academic colleagues from across the University of Bristol.

## Programme

### Lightning talks: 3 min presentations from Compass PhD students

• Mauro Camara Escudero: Approximate Manifold Sampling
• Doug Corbin: Partitioned Polynomial Thompson Sampling for Contextual Multi-Armed Bandits.
• Dom Owens: FNETS: An R Package for Network Analysis and Forecasting of High-Dimensional Time Series with Factor-Adjusted Vector Autoregressive Models
• Jake Spiteri: A non-parametric method for state-space models
• Daniel Williams: Kernelised Stein Discrepancies for Truncated Probability Density Estimation
• Conor Crilly: Efficient Emulation of a Radionuclide Transport Model
• Annie Gray: Discovering latent topology and geometry in data: a law of large dimension
• Conor Newton: Mutli-Agent Multi-Armed Bandits
• Jack Simons: Variational Likelihood-Free Gradient Descent
• Anthony Stephenson: Provably Reliable Large-Scale Sampling from Gaussian Processes
• Dan Ward: Robust Neural Posterior Estimation
• Shannon Williams: Sampling Schemes for Volcanic Ash Dispersion Hazard Assessment
• Dominic Broadbent: Bayesian Coresets Versus the Laplace Approximation
• Emerald Dilworth: Using Web Data and Network Embedding to Detect Spatial Relationships
• Ettore Fincato: Markov state modelling and Gibbs sampling
• Josh Givens: DRE and NP Classification with Missing Data
• Ben Griffiths: Faster Model Fitting for Quantile Additive Models
• Tennessee Hickling: Flexible Tails for Normalising Flows
• Daniel Milner: When Does Market Access Improve Smallholder Nutrition? A Multilevel Analysis
• Edward Milsom: Deep Kernel Machines

### Research talks

• Ed DavisUniversal Dynamic Network Embedding – How to Comprehend Changes in 20,000 Dimensions
• Ettore FincatoSpectral analysis of the Gibbs sampler with the concept of conductance
• Alexander Modell – Network community detection under degree heterogeneity: spectral clustering with the random walk Laplacian
• Hannah SansfordImplications of sparsity and high triangle density for graph representation learning
• Michael WhitehouseConsistent and fast inference in compartmental models of epidemics via Poisson Approximate Likelihoods

### Special guest lecture

John Burn-Murdoch, Chief Data Reporter at the Financial Times.

Making charts that make an impact: An exploration of what makes data visualisation effective as a means of communication, drawing on the latest scientific research, plus John’s experiences from visualising the pandemic.

## Attendees

Attendees included academics associated with Compass from across the University of Bristol. Our external attendees were invited from the following partner organisations.

Alan Turing Institute
Allianz Personal
AWE
British Telecom
CheckRisk LLP
COVID-19 Actuaries Response Group
Financial Times
GSK
Improbable
Infinitesima
International Livestock Research Institute
LV= General Insurance
Met Office
NVIDIA
TGE Data Science
Trilateral Research
UK Health Security Agency

## Student Perspectives: An introduction to normalising flows

A post by Dan Ward, PhD student on the Compass programme.

Normalising flows are black-box approximators of continuous probability distributions, that can facilitate both efficient density evaluation and sampling. They function by learning a bijective transformation that maps between a complex target distribution and a simple distribution with matching dimension, such as a standard multivariate Gaussian distribution.

# Transforming distributions

Before introducing normalising flows, it is useful to introduce the idea of transforming distributions more generally. Lets say we have two uniform random variables, $u \sim \text{Uniform}(0, 1)$, and $x \sim \text{Uniform}(0, 2)$. In this case, it is straight forward to define the bijective transformation $T$ that maps between these two distributions, as shown below.

If we wished to sample $p(x)$, but could not do so directly, we could instead sample $u \sim p(u)$, and then apply the transformation $x=T(u)$. If we wished to evaluate the density of $p(x)$, but could not do so directly, we can rewrite $p(x)$ in terms of $p(u)$

$p(x)= p(u=T^{-1}(x)) \cdot 2^{-1},$

where $p(u=T^{-1}(x))$ is the density of the corresponding point in the $u$ space, and dividing by 2 accounts for the fact that the transformation $T$ stretches the space by a factor of 2, “diluting” the probability mass. The key thing to notice here, is that we can describe the sampling and density evaluation operations of one distribution, $p(x)$, based on a bijective transformation of another, potentially easier to work with distribution, $p(u)$. (more…)

## Student perspectives: Neural Point Processes for Statistical Seismology

A post by Sam Stockman, PhD student on the Compass programme.

# Introduction

Throughout my PhD I aim to bridge a gap between advances made in the machine learning community and the age-old problem of earthquake forecasting. In this cross-disciplinary work with Max Werner from the School of Earth Sciences and Dan Lawson from the School of Mathematics, I hope to create more powerful, efficient and robust models for forecasting, that can make earthquake prone areas safer for their inhabitants.

For years seismologists have sought to model the structure and dynamics of the earth in order to make predictions about earthquakes. They have mapped out the structure of fault lines and conducted experiments in the lab where they submit rock to great amounts of force in order to simulate plate tectonics on a small scale. Yet when trying to forecast earthquakes on a short time scale (that’s hours and days, not tens of years), these models based on the knowledge of the underlying physics are regularly outperformed by models that are statistically motivated. In statistical seismology we seek to make predictions through looking at distributions of the times, locations and magnitudes of earthquakes and use them to forecast the future.

## Ed Davis wins poster competition

Congratulations to Ed Davis who won a poster award as part of the Jean Golding Institute’s Beauty of Data competition.

This visualisation, entitled “The World Stage”, gives a new way of representing the positions of countries. Instead of placing them based on their geographical position, they have been placed based on their geopolitical alliances. Countries have been placed such to minimise the distance to their allies and maximise the distance to their non-allies based on 40 different alliances involving 161 countries. This representation was achieved by embedding the alliance network using Node2Vec, followed by principal component analysis (PCA) to reduce it to 2D.

## Student perspectives: Sampling from Gaussian Processes

A post by Anthony Stephenson, PhD student on the Compass programme.

## Introduction

The general focus of my PhD research is in some sense to produce models with the following characteristics:

1. Well-calibrated (uncertainty estimates from the predictive process reflect the true variance of the target values)
2. Non-linear
3. Scalable (i.e. we can run it on large datasets)

At a vague high-level, we can consider that we can have two out of three of those requirements without too much difficulty, but including the third causes trouble. For example, Bayesian linear models would satisfy good-calibration and scalability but (as the name suggests) fail at modelling non-linear functions. Similarly, neural-networks are famously good at modelling non-linear functions and much work has been spent on improving their efficiency and scalability, but producing well-calibrated predictions is a complex additional feature. I am approaching the problem from the angle of Gaussian Processes, which provide well-calibrated non-linear models; at the expense of scalability.

### Gaussian Processes (GPs)

See Conor’s blog post for a more detailed introduction to GPs; here I will provide a basic summary of the key facts we need for the rest of the post.

The functional view of GPs is that we define a distribution over functions:

$f(\cdot) \sim \mathcal{GP}(m(\cdot),k(\cdot, x))$

where $m$ and $k$ are the mean function and kernel function respectively, which play analogous roles to the usual mean and covariance of a Gaussian distribution.

In practice, we only ever observe some finite collection of points, corrupted by noise, which we can hence view as a draw from some multivariate normal distribution:

$y_n \sim \mathcal{N}(0, \underbrace{K_n + \sigma^2I_n}_{K_\epsilon})$

where

$y_n = f_n(x) + \epsilon_n$ with $\epsilon \sim \mathcal{N}(0,\sigma^2)$.

(Here subscript $n$ denotes dimensionality of the vector or matrix).

When we use GPs to generate predictions at some new test point $x_\star \notin X$ we use the following equations which I will not derive here (See [1]) for the predicted mean and variance respectively:

$\mu(x_\star) = k(x_\star,X)K_\epsilon^{-1}y_n$

$V(x_\star) = k(x_\star, x_\star) - k(x_\star, X)K_\epsilon^{-1}k(X,x_\star)$

The key point here is that both predictive functions involve the inversion of an $n\times n$ matrix at a cost of $\mathcal{O}(n^3)$.

## Student Perspectives: Application of Density Ratio Estimation to Likelihood-Free problems

A post by Jack Simons, PhD student on the Compass programme.

# Introduction

I began my PhD with my supervisors, Dr Song Liu and Professor Mark Beaumont with the intention of combining their respective fields of research; Density Ratio Estimation (DRE), and Simulation Based Inference (SBI):

• DRE is a rapidly growing paradigm in machine learning which (broadly) provides efficient methods of comparing densities without the need to compute each density individually. For a comprehensive yet accessible overview of DRE in Machine Learning see [1].
• SBI is a group of methods which seek to solve Bayesian inference problems when the likelihood function is intractable. If you wish for a concise overview of the current work, as well as motivation then I recommend [2].

Last year we released a paper, Variational Likelihood-Free Gradient Descent [3] which combined these fields. This blog post seeks to condense, and make more accessible, the contents of the paper.

# Motivation: Likelihood-Free Inference

Let’s begin by introducing likelihood-free inference. We wish to do inference on the posterior distribution of parameters $\theta$ for a specific observation $x=x_{\mathrm{obs}}$, i.e. we wish to infer $p(\theta|x_{\mathrm{obs}})$ which can be decomposed via Bayes’ rule as

$p(\theta|x_{\mathrm{obs}}) = \frac{p(x_{\mathrm{obs}}|\theta)p(\theta)}{\int p(x_{\mathrm{obs}}|\theta)p(\theta) \mathrm{d}\theta}.$

The likelihood-free setting is that, additional to the usual intractability of the normalising constant in the denominator, the likelihood, $p(x|\theta)$, is also intractable. In lieu of this, we require an implicit likelihood which describes the relation between data $x$ and parameters $\theta$ in the form of a forward model/simulator (hence simulation based inference!). (more…)