## Student Perspectives: An Introduction to Deep Kernel Machines

A post by Edward Milsom, PhD student on the Compass programme.

This blog post provides a simple introduction to Deep Kernel Machines[1] (DKMs), a novel supervised learning method that combines the advantages of both deep learning and kernel methods. This work provides the foundation of my current research on convolutional DKMs, which is supervised by Dr Laurence Aitchison.

# Why aren’t kernels cool anymore?

Kernel methods were once top-dog in machine learning due to their ability to implicitly map data to complicated feature spaces, where the problem usually becomes simpler, without ever explicitly computing the transformation. However, in the past decade deep learning has become the new king for complicated tasks like computer vision and natural language processing.

## Neural networks are flexible when learning representations

The reason is twofold: First, neural networks have millions of tunable parameters that allow them to learn their feature mappings automatically from the data, which is crucial for domains like images which are too complex for us to specify good, useful features by hand. Second, their layer-wise structure means these mappings can be built up to increasingly more abstract representations, while each layer itself is relatively simple[2]. For example, trying to learn a single function that takes in pixels from pictures of animals and outputs their species is difficult; it is easier to map pixels to corners and edges, then shapes, then body parts, and so on.

## Kernel methods are rigid when learning representations

It is therefore notable that classical kernel methods lack these characteristics: most kernels have a very small number of tunable hyperparameters, meaning their mappings cannot flexibly adapt to the task at hand, leaving us stuck with a feature space that, while complex, might be ill-suited to our problem. (more…)

## Student perspectives: ensemble modelling for probabilistic volcanic ash hazard forecasting

A post by Shannon Williams, PhD student on the Compass programme.

My PhD focuses on the application of statistical methods to volcanic hazard forecasting. This research is jointly supervised by Professor Jeremy Philips (School of Earth Sciences) and Professor Anthony Lee. (more…)

## Student perspectives: Neural Point Processes for Statistical Seismology

A post by Sam Stockman, PhD student on the Compass programme.

# Introduction

Throughout my PhD I aim to bridge a gap between advances made in the machine learning community and the age-old problem of earthquake forecasting. In this cross-disciplinary work with Max Werner from the School of Earth Sciences and Dan Lawson from the School of Mathematics, I hope to create more powerful, efficient and robust models for forecasting, that can make earthquake prone areas safer for their inhabitants.

For years seismologists have sought to model the structure and dynamics of the earth in order to make predictions about earthquakes. They have mapped out the structure of fault lines and conducted experiments in the lab where they submit rock to great amounts of force in order to simulate plate tectonics on a small scale. Yet when trying to forecast earthquakes on a short time scale (that’s hours and days, not tens of years), these models based on the knowledge of the underlying physics are regularly outperformed by models that are statistically motivated. In statistical seismology we seek to make predictions through looking at distributions of the times, locations and magnitudes of earthquakes and use them to forecast the future.

## Student Perspectives: Application of Density Ratio Estimation to Likelihood-Free problems

A post by Jack Simons, PhD student on the Compass programme.

# Introduction

I began my PhD with my supervisors, Dr Song Liu and Professor Mark Beaumont with the intention of combining their respective fields of research; Density Ratio Estimation (DRE), and Simulation Based Inference (SBI):

• DRE is a rapidly growing paradigm in machine learning which (broadly) provides efficient methods of comparing densities without the need to compute each density individually. For a comprehensive yet accessible overview of DRE in Machine Learning see [1].
• SBI is a group of methods which seek to solve Bayesian inference problems when the likelihood function is intractable. If you wish for a concise overview of the current work, as well as motivation then I recommend [2].

Last year we released a paper, Variational Likelihood-Free Gradient Descent [3] which combined these fields. This blog post seeks to condense, and make more accessible, the contents of the paper.

# Motivation: Likelihood-Free Inference

Let’s begin by introducing likelihood-free inference. We wish to do inference on the posterior distribution of parameters $\theta$ for a specific observation $x=x_{\mathrm{obs}}$, i.e. we wish to infer $p(\theta|x_{\mathrm{obs}})$ which can be decomposed via Bayes’ rule as

$p(\theta|x_{\mathrm{obs}}) = \frac{p(x_{\mathrm{obs}}|\theta)p(\theta)}{\int p(x_{\mathrm{obs}}|\theta)p(\theta) \mathrm{d}\theta}.$

The likelihood-free setting is that, additional to the usual intractability of the normalising constant in the denominator, the likelihood, $p(x|\theta)$, is also intractable. In lieu of this, we require an implicit likelihood which describes the relation between data $x$ and parameters $\theta$ in the form of a forward model/simulator (hence simulation based inference!). (more…)

## Student Perspectives: Contemporary Ideas in Statistical Philosophy

A post by Alessio Zakaria, PhD student on the Compass programme.

## Introduction

Probability theory is a branch of mathematics centred around the abstract manipulation and quantification of uncertainty and variability. It forms a basic unit of the theory and practice of statistics, enabling us to tame the complex nature of observable phenomena into meaningful information. It is through this reliance that the debate over the true (or more correct) underlying nature of probability theory has profound effects on how statisticians do their work. The current opposing sides of the debate in question are the Frequentists and the Bayesians. Frequentists believe that probability is intrinsically linked to the numeric regularity with which events occur, i.e. their frequency. Bayesians, however, believe that probability is an expression of someones degree of belief or confidence in a certain claim. In everyday parlance we use both of these concepts interchangeably: I estimate one in five of people have Covid; I was 50% confident that the football was coming home. It should be noted that the latter of the two is not a repeatable event per se. We cannot roll back time to check what the repeatable sequence would result in.