Summer applications deadline for Compass CDT: 12 June 2023

We are happy to announce our upcoming applications deadline is 12 June 2023, 23:59 (London UK time zone) for the final few fully funded places to start September 2023. For international applications there are limited scholarship funded places available. Early applications advised.

Compass is offering specific projects for PhD students to study from Sept 2023. We are pleased to announce that there are 4 new project opportunities to study. The full list of the projects/ supervisors has been updated. All the supervisors listed are open to discussion on the projects provided and can also talk to applicants about other project ideas. Please provide a ranked list of 3 projects of interest: 1 = project of highest interest. Project supervisors will be happy to respond to specific questions you have after reading the proposals. Applicants should contact them by email if they wish beforehand.

New PhD Projects available

The full list of the projects/ supervisors.

PhD Project Allocation Process

Application forms will be reviewed based on the 3 ranked projects specified or other proposed topic. Successful applicants will be invited to attend an interview with the Compass admissions tutors and the specific project supervisor. If you are made an offer of PhD study it will be published through the online application system. You will then have 2 weeks to consider the offer before deciding whether to accept or decline.

We welcome applications from all members of our community and are particularly encouraging those from diverse groups, such as members of the LGBT+ and black, Asian and minority ethnic communities, to join us.

APPLICATIONS DEADLINE

12 June 2023, 23:59 (London UK time zone)

APPLY NOW

Advantages of being a Compass Student

  • Stipend – a generous stipend of £22,622 pa tax free, paid in monthly payments. Plus your own expense budget of £1,000 pa towards travel and research activity.
  • No fees – all tuition fees are covered by the EPSRC and University of Bristol.
  • Bespoke training – first year units are designed specifically for the academic needs of each Compass student, which enables students to develop knowledge and capability to pursue cross-disciplinary PhD research.
  • Supervisors – supervisors from across academic disciplines offer a range of research projects.
  • Cohort – Compass students benefit from dedicated offices and collaboration spaces, enabling strong cohort links and opportunities for shared learning and research.

About Compass CDT

A 4-year bespoke PhD training programme in the statistical and computational techniques of data science, with partners from across the University of Bristol, industry and government agencies.

The cross-disciplinary programme offers exciting collaborations across medicine, computer science, geography, economics, life and earth sciences, as well as with our external partners who range from government organisations such as the Office for National Statistics, NCSC and the AWE, to industrial partners such as LV, Improbable, IBM Research, EDF, and AstraZeneca.

Students are co-located with the Institute for Statistical Science in the School of Mathematics, which occupies the Fry Building.

Hear from our students about their experience with the programme

  • Compass has allowed me to advance my statistical knowledge and apply it to a range of exciting applied projects, as well as develop skills that I’m confident will be highly useful for a future career in data science. – Shannon, Cohort 2

  • With the Compass CDT I feel part of a friendly, interactive environment that is preparing me for whatever I move on to next, whether it be in Academia or Industry. – Sam, Cohort 2

  • An incredible opportunity to learn the ever-expanding field of data science, statistics and machine learning amongst amazing people. – Danny, Cohort 1

Compass CDT Video

Find out more about what it means to be a part of the Compass programme from our students in this short video.

APPLICATIONS DEADLINE

12 June 2023, 23:59 (London UK time zone)

APPLY NOW

Student Perspectives: Intro to Recommendation Systems

A post by Hannah Sansford, PhD student on the Compass programme.

Introduction

Like many others, I interact with recommendation systems on a daily basis; from which toaster to buy on Amazon, to which hotel to book on booking.com, to which song to add to a playlist on Spotify. They are everywhere. But what is really going on behind the scenes?

Recommendation systems broadly fit into two main categories:

1) Content-based filtering. This approach uses the similarity between items to recommend items similar to what the user already likes. For instance, if Ed watches two hair tutorial videos, the system can recommend more hair tutorials to Ed.

2) Collaborative filtering. This approach uses the the similarity between users’ past behaviour to provide recommendations. So, if Ed has watched similar videos to Ben in the past, and Ben likes a cute cat video, then the system can recommend the cute cat video to Ed (even if Ed hasn’t seen any cute cat videos).

Both systems aim to map each item and each user to an embedding vector in a common low-dimensional embedding space E = \mathbb{R}^d. That is, the dimension of the embeddings (d) is much smaller than the number of items or users. The hope is that the position of these embeddings captures some of the latent (hidden) structure of the items/users, and so similar items end up ‘close together’ in the embedding space. What is meant by being ‘close’ may be specified by some similarity measure.

Collaborative filtering

In this blog post we will focus on the collaborative filtering system. We can break it down further depending on the type of data we have:

1) Explicit feedback data: aims to model relationships using explicit data such as user-item (numerical) ratings.

2) Implicit feedback data: analyses relationships using implicit signals such as clicks, page views, purchases, or music streaming play counts. This approach makes the assumption that: if a user listens to a song, for example, they must like it.

The majority of the data on the web comes from implicit feedback data, hence there is a strong demand for recommendation systems that take this form of data as input. Furthermore, this form of data can be collected at a much larger scale and without the need for users to provide any extra input. The rest of this blog post will assume we are working with implicit feedback data.

Problem Setup

Suppose we have a group of n users U = (u_1, \ldots, u_n) and a group of m items I = (i_1, \ldots, i_m). Then we let \mathbf{R} \in \mathbb{R}^{n \times m} be the ratings matrix where position R_{ui} represents whether user u interacts with item i. Note that, in most cases the matrix \mathbf{R} is very sparse, since most users only interact with a small subset of the full item set I. For any items i that user u does not interact with, we set R_{ui} equal to zero. To be clear, a value of zero does not imply the user does not like the item, but that they have not interacted with it. The final goal of the recommendation system is to find the best recommendations for each user of items they have not yet interacted with.

Matrix Factorisation (MF)

A simple model for finding user emdeddings, \mathbf{X} \in \mathbb{R}^{n \times d}, and item embeddings, \mathbf{Y} \in \mathbb{R}^{m \times d}, is Matrix Factorisation. The idea is to find low-rank embeddings such that the product \mathbf{XY}^\top is a good approximation to the ratings matrix \mathbf{R} by minimising some loss function on the known ratings.

A natural loss function to use would be the squared loss, i.e.

L(\mathbf{X}, \mathbf{Y}) = \sum_{u, i} \left(R_{ui} - \langle X_u, Y_i \rangle \right)^2.

This corresponds to minimising the Frobenius distance between \mathbf{R} and its approximation \mathbf{XY}^\top, and can be solved easily using the singular value decomposition \mathbf{R} = \mathbf{U S V}^\top.

Once we have our embeddings \mathbf{X} and \mathbf{Y}, we can look at the row of \mathbf{XY}^\top corresponding to user u and recommend the items corresponding to the highest values (that they haven’t already interacted with).

Logistic MF

Minimising the loss function in the previous section is equivalent to modelling the probability that user u interacts with item i as the inner product \langle X_u, Y_i \rangle, i.e.

R_{ui} \sim \text{Bernoulli}(\langle X_u, Y_i \rangle),

and maximising the likelihood over \mathbf{X} and \mathbf{Y}.

In a research paper from Spotify [3], this relationship is instead modelled according to a logistic function parameterised by the sum of the inner product above and user and item bias terms, \beta_u and \beta_i,

R_{ui} \sim \text{Bernoulli} \left( \frac{\exp(\langle X_u, Y_i \rangle + \beta_u + \beta_i)}{1 + \exp(\langle X_u, Y_i \rangle + \beta_u + \beta_i)} \right).

Relation to my research

A recent influential paper [1] proved an impossibility result for modelling certain properties of networks using a low-dimensional inner product model. In my 2023 AISTATS publication [2] we show that using a kernel, such as the logistic one in the previous section, to model probabilities we can capture these properties with embeddings lying on a low-dimensional manifold embedded in infinite-dimensional space. This has various implications, and could explain part of the success of Spotify’s logistic kernel in producing good recommendations.

References

[1] Seshadhri, C., Sharma, A., Stolman, A., and Goel, A. (2020). The impossibility of low-rank representations for triangle-rich complex networks. Proceedings of the National Academy of Sciences, 117(11):5631–5637.

[2] Sansford, H., Modell, A., Whiteley, N., and Rubin-Delanchy, P. (2023). Implications of sparsity and high triangle density for graph representation learning. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5449-5473.

[3] Johnson, C. C. (2014). Logistic matrix factorization for implicit feedback data. Advances in Neural Information Processing Systems, 27(78):1–9.

 

Compass students at AISTATS 2023

Congratulations to Compass students Josh Givens, Hannah Sansford and Alex Modell who, along with their supervisors have had their papers accepted to be published at AISTATS 2023.

 

‘Implications of sparsity and high triangle density for graph representation learning’

Hannah Sansford, Alexander Modell, Nick Whiteley, Patrick Rubin-Delanchy

Hannah: In this paper we explore the implications of two common characteristics of real-world networks, sparsity and triangle-density, for graph representation learning. An example of where these properties arise in the real-world is in social networks, where, although the number of connections each individual has compared to the size of the network is small (sparsity), often a friend of a friend is also a friend (triangle-density). Our result counters a recent influential paper that shows the impossibility of simultaneously recovering these properties with finite-dimensional representations of the nodes, when the probability of connection is modelled by the inner-product. We, by contrast, show that it is possible to recover these properties using an infinite-dimensional inner-product model, where representations lie on a low-dimensional manifold. One of the implications of this work is that we can ‘zoom-in’ to local neighbourhoods of the network, where a lower-dimensional representation is possible.

The paper has been selected for oral presentation at the conference in Valencia (<2% of submissions). 

 

Density Ratio Estimation and Neyman Pearson Classification with Missing Data

Josh Givens, Song Liu, Henry W J Reeve

Josh: In our paper we adapt the popular density ratio estimation procedure KLIEP to make it robust to missing not at random (MNAR) data and demonstrate its efficacy in Neyman-Pearson (NP) classification. Density ratio estimation (DRE) aims to characterise the difference between two classes of data by estimating the ratio between their probability densities. The density ratio is a fundamental quantity in statistics appearing in many settings such as classification, GANs, and covariate shift making its estimation a valuable goal. To our knowledge there is no prior research into DRE with MNAR data, a missing data paradigm where the likelihood of an observation being missing depends on its underlying value. We propose the estimator M-KLIEP and provide finite sample bounds on its accuracy which we show to be minimax optimal for MNAR data. To demonstrate the utility of this estimator we apply it the the field of NP classification. In NP classification we aim to create a classifier which strictly controls the probability of incorrectly classifying points from one class. This is useful in any setting where misclassification for one class is much worse than the other such as fault detection on a production line where you would want to strictly control the probability of classifying a faulty item as non-faulty. In addition to showing the efficacy of our new estimator in this setting we also provide an adaptation to NP classification which allows it to still control this misclassification probability even when fit using MNAR data.

Student Perspectives: An Introduction to Stochastic Gradient Methods

A post by Ettore Fincato, PhD student on the Compass programme.

This post provides an introduction to Gradient Methods in Stochastic Optimisation. This class of algorithms is the foundation of my current research work with Prof. Christophe Andrieu and Dr. Mathieu Gerber, and finds applications in a great variety of topics, such as regression estimation, support vector machines, convolutional neural networks.

We can see below a simulation by Emilien Dupont (https://emiliendupont.github.io/) which represents two trajectories of an optimisation process of a time-varying function. This well describes the main idea behind the algorithms we will be looking at, that is, using the (stochastic) gradient of a (random) function to iteratively reach the optimum.

Stochastic Optimisation

Stochastic optimisation was introduced by [1], and its aim is to find a scheme for solving equations of the form \nabla_w g(w)=0 given “noisy” measurements of g [2].

In the simplest deterministic framework, one can fully determine the analytical form of g(w), knows that it is differentiable and admits an unique minimum – hence the problem

w_*=\underset{w}{\text{argmin}}\quad g(w)

is well defined and solved by \nabla_w g(w)=0.

On the other hand, one may not be able to fully determine g(w) because his experiment is corrupted by a random noise. In such cases, it is common to identify this noise with a random variable, say V, consider an unbiased estimator \eta(w,V) s.t. \mathbb{E}_V[\eta(w,V)]=g(w) and to rewrite the problem as

w_*=\underset{w}{\text{argmin}}\quad\mathbb{E}_V[\eta(w,V)].

(more…)

Compass CDT is recruiting for its final few fully funded places to start September 2023

We are happy to announce our upcoming applications deadline of 16 March 2023 for Compass CDT programme. For international applications there are limited scholarship funded places available for this final recruitment round. Early applications advised.

Compass is offering specific projects for PhD students to study from Sept 2023. The projects are listed in the research section of our website. All the supervisors listed are open to discussion on the projects provided and can also talk to applicants about other project ideas. Please provide a ranked list of 3 projects of interest: 1 = project of highest interest. Project supervisors will be happy to respond to specific questions you have after reading the proposals. Applicants should contact them by email if they wish beforehand.

Also, we are pleased to announce a new project Genetic Similarity Based Cohort Building that has been added to the list for September 2023 start funded by Roche, one of the world’s largest biotech companies, as well as a leading provider of in-vitro diagnostics and a global supplier of transformative innovative solutions across major disease areas.

PhD Project Allocation Process

Application forms will be reviewed based on the 3 ranked projects specified or other proposed topic. Successful applicants will be invited to attend an interview with the Compass admissions tutors and the specific project supervisor. If you are made an offer of PhD study it will be published through the online application system. You will then have 2 weeks to consider the offer before deciding whether to accept or decline.

We welcome applications from all members of our community and are particularly encouraging those from diverse groups, such as members of the LGBT+ and black, Asian and minority ethnic communities, to join us.

APPLICATIONS DEADLINE

16 March 2023, 23:59 (London UK time zone)

APPLY NOW

Advantages of being a Compass Student

  • Stipend – a generous stipend of £21,668 pa tax free, paid in monthly payments. Plus your own expense budget of £1,000 pa towards travel and research activity.
  • No fees – all tuition fees are covered by the EPSRC and University of Bristol.
  • Bespoke training – first year units are designed specifically for the academic needs of each Compass student, which enables students to develop knowledge and capability to pursue cross-disciplinary PhD research.
  • Supervisors – supervisors from across academic disciplines offer a range of research projects.
  • Cohort – Compass students benefit from dedicated offices and collaboration spaces, enabling strong cohort links and opportunities for shared learning and research.

About Compass CDT

A 4-year bespoke PhD training programme in the statistical and computational techniques of data science, with partners from across the University of Bristol, industry and government agencies.

The cross-disciplinary programme offers exciting collaborations across medicine, computer science, geography, economics, life and earth sciences, as well as with our external partners who range from government organisations such as the Office for National Statistics, NCSC and the AWE, to industrial partners such as LV, Improbable, IBM Research, EDF, and AstraZeneca.

Students are co-located with the Institute for Statistical Science in the School of Mathematics, which occupies the Fry Building.

Hear from our students about their experience with the programme

  • Compass has allowed me to advance my statistical knowledge and apply it to a range of exciting applied projects, as well as develop skills that I’m confident will be highly useful for a future career in data science. – Shannon, Cohort 2

  • With the Compass CDT I feel part of a friendly, interactive environment that is preparing me for whatever I move on to next, whether it be in Academia or Industry. – Sam, Cohort 2

  • An incredible opportunity to learn the ever-expanding field of data science, statistics and machine learning amongst amazing people. – Danny, Cohort 1

Compass CDT Video

Find out more about what it means to be a part of the Compass programme from our students in this short video.

APPLICATIONS DEADLINE

16 March 2023, 23:59 (London UK time zone)

APPLY NOW

Compass at NeurIPS 2022

A post by Anthony Stephenson, Jack Simons, and Dan Ward, PhD students on the Compass programme.

Introduction

Ant Stephenson, Jack Simons, and I (Dan Ward) had the pleasure of attending the 2022 Conference on Neural Information Processing Systems (NeurIPS), one of the largest machine learning conferences in the world. The conference was held in New Orleans, which gave us an opportunity to explore a lively city full of culture with delicious local cuisine. We thought we’d collaborate on a blog post together covering some of the highlights.

Memorable Talks

The conference had broad range of talks including technical presentations of research, applied projects, and discussions of the philosophical and ethical questions that arise in AI. To give a taste of some of the talks, we picked out some of our favourites below.

Noam Brown: Human Modelling and Strategic Reasoning in the Game of Diplomacy.

The Game of Diplomacy is a strategic board game invented in 1954. It’s unique feature, and of crucial importance of the game, is that players interact via natural language to form allegiances. Whilst AI has been successful in beating humans in many purely adversarial games (e.g. Chess, Go), this collaborative element poses unique challenges. Firstly, it isn’t obvious how to evaluate/devise strategies for collaboration/betrayal, especially in the self-play-based reinforcement learning paradigm. Secondly, as communication happens via natural language, the AI must be able to translate their strategic plan into text. This strange combination of problems lead to interesting and innovative solutions. Paper link here.

Geoffrey Hinton: The Forward-Forward Algorithm for Training Deep Neural Networks.

Among the great line-up of speakers was Professor Geoffrey Hinton, known for popularising backpropogation for deep neural networks. Inspired by producing a more biologically plausible algorithm for learning, he has proposed the ‘Forward-Forward’ algorithm which he claims can also explain the phenomena of sleep! Professor Geoffrey Hinton then went on to express his belief that using biologically-inspired hardware, so-called neuromorphic computing, may play a key role in advancing AI. The talk was certainly unconventional, but nevertheless entertaining. Paper link here.

David Chalmers: Could a Large Language Model be Conscious?

Amongst all the machine-learning experts was David Chalmers, a philosopher! There are important questions regarding the possibility that language models might be conscious. David Chalmers aimed to educate the machine-learning audience in attendance of how we can better think about these problems and re-phrase the questions that we’re asking. We concluded that these questions are, unsurprisingly, best left to philosophers!

Poster Sessions


Jack and Dan:

I (Dan), presented a poster of my work at the conference, on Robust Neural Posterior Estimation (paper link here). I was definitely surprised by the scale of the poster sessions, and the broad scope of all the work taking place. Below is some of the posters that me and Jack found interesting:

Contrastive Neural Ratio Estimation
Benjamin K. Miller · Christoph Weniger · Patrick Forré
Authors propose NRE-C which aims to generalise NRE-A (Hermans et al. (2019)) and NRE-B (Durkan et al. (2020)) into one method. NRE-C can recover both methods by taking their two introduced hyperparameters at certain limits. Paper link here.

Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation
Arnaud Delaunoy · Joeri Hermans · François Rozet · Antoine Wehenkel · Gilles Louppe
These authors also make a contribution to the field of neural ratio estimation in the simulation-based inference context. Authors propose the notion of a “balanced” classifier, which is a classifier in which the average output from the classifier over the positive data class plus the average output over the negative data class equals to 1. The authors argue that if one has a classifier is balanced then it will lead to more conservative posterior estimates, which is something which practitioners seek. To integrate this into an algorithm they suggest adding a penalisation term onto the standard logistic-loss which punishes classifiers as they become less balanced. Paper link here.

Training and Inference on Any-Order Autoregressive Models the Right Way
Andy Shih · Dorsa Sadigh · Stefano Ermon
A joint distribution can be decomposed into its univariate conditionals by the chain rule, although by doing so we implicitly choose an ordering in a model, which prevents arbitrary conditional inference. Any-order autoregressive models circumvent this generally by being trained such that all possible univariate conditionals are considered, but this leads to learning redundant information. The paper proposes a new method to train autoregressive models, using a subset of univariate conditionals that still supports arbitrary conditional inference. This research was also presented as a talk, but sadly we missed it!  Paper link here.

Anthony:

The poster sessions formed the bulk of the conference timetable, with 2 2-hour sessions per day, on Tuesday, Wednesday and Thursday. These were very busy, with many posters on a wide-range of topics and a large congregation of attendees. As a result, it was sometimes difficult to track down the subset of posters on material of particular interest and when this feat was achieved, on occasion it was still hard work to actually have a detailed conversation with the author(s). Nonetheless, it was interesting to see the how varied the subjects of the poster were and in addition get a feeling for “themes” of the conference: recurring, clearly in-vogue topics. Amongst the sea of posters, I did manage to find a number relating to GPs; of these, those I found most interesting were:

Posterior and Computational Uncertainty in Gaussian Processes:
Jonathan Wenger · Geoff Pleiss · Marvin Pförtner · Philipp Hennig · John Cunningham
Here the authors propose a way to naturally incorporate uncertainty introduced from the use of (iterative) GP approximation methods. Paper link here.

Sparse Gaussian Process Hyperparameters: Optimize or Integrate?
Vidhi Lalchand · Wessel Bruinsma · David Burt · Carl Edward Rasmussen
The authors attempt to integrate a fully-Bayesian inference procedure for sparse GPs, as an alternative to the commonly adopted approach of optimising the kernel hyperparameters by maximum likelihood estimation. Paper link here.

Log-Linear-Time Gaussian Processes Using Binary Tree Kernels:
Michael K. Cohen · Samuel Daulton · Michael A Osborne
The idea here feels a bit unorthodox; they use a “binary-tree” kernel which discretises the space, with quantization error determined by the number of leaves. This would seem to lose interpretability on the properties of the function prior (e.g. smoothness), but does appear to give empirical benefits in their experiments. Paper link here.

Workshops

In addition to the main conference, on the Friday and Saturday at the end of the week there were a selection of workshops on a variety of sub-fields within machine learning. If you are fortunate enough for there to be a workshop dedicated to your research area, then they provide a space to gather people with research directly relevant to your own and facilitate helpful discussions and networking opportunities.

Anthony:

For me, the “Gaussian processes, spatiotemporal modeling and decision-making systems” workshop was the most useful part of the conference. It gave me the chance to speak to people working on interesting problems related to my own; discover the kind of directions they are heading in and lines of work they are contemplating. Additionally, I presented a poster during this workshop which allowed me to discuss my work with an audience well-versed on the topic and its possible significance.

The Big Easy

In addition to the actual conference, attending NeurIPS also gave us the opportunity to explore the city of New Orleans; aka The Big Easy. Upon arrival, we were immediately greeted in the airport by the sound of Louis Armstrong, a strong theme in the city, which features a park named after him. New ‘Awlins’ is well known for its jazz, but awareness of this fact does not necessarily prepare you for the sheer quantity, especially in the streets of the French Quarter, that awaits you. The real epicentre of jazz in the city is situated on Frenchmen street, on which a swathe of bars hosting nearly-nightly live music reside. We spent several evenings there, including one of particular note, where French president Emmanuel Macron suddenly appeared, trailed by an extensive retinue of blue-suited aides and bodyguards. Another street in New Orleans infamous for its nightlife is Bourbon street. Where Frenchmen street is focused on jazz, Bourbon street contains all manner of rowdy madness, assaulting your senses with noise, smells and sights as soon as you arrive. Both are necessary experiences when visiting The Big Easy.

Conclusion

All in all, the conference was a great opportunity to get a taste of the massive array of research that occurs in machine learning. We were all surprised by the scope of the research topics and talks, and enjoyed the opportunity to explore a new culture and city.

Student Perspectives: An Introduction to Deep Kernel Machines

A post by Edward Milsom, PhD student on the Compass programme.

This blog post provides a simple introduction to Deep Kernel Machines[1] (DKMs), a novel supervised learning method that combines the advantages of both deep learning and kernel methods. This work provides the foundation of my current research on convolutional DKMs, which is supervised by Dr Laurence Aitchison.

Why aren’t kernels cool anymore?

Kernel methods were once top-dog in machine learning due to their ability to implicitly map data to complicated feature spaces, where the problem usually becomes simpler, without ever explicitly computing the transformation. However, in the past decade deep learning has become the new king for complicated tasks like computer vision and natural language processing.

Neural networks are flexible when learning representations

The reason is twofold: First, neural networks have millions of tunable parameters that allow them to learn their feature mappings automatically from the data, which is crucial for domains like images which are too complex for us to specify good, useful features by hand. Second, their layer-wise structure means these mappings can be built up to increasingly more abstract representations, while each layer itself is relatively simple[2]. For example, trying to learn a single function that takes in pixels from pictures of animals and outputs their species is difficult; it is easier to map pixels to corners and edges, then shapes, then body parts, and so on.

Kernel methods are rigid when learning representations

It is therefore notable that classical kernel methods lack these characteristics: most kernels have a very small number of tunable hyperparameters, meaning their mappings cannot flexibly adapt to the task at hand, leaving us stuck with a feature space that, while complex, might be ill-suited to our problem. (more…)

Student Perspectives: Spectral Clustering for Rapid Identification of Farm Strategies

A post by Dan Milner, PhD student on the Compass programme.

Image 1: Smallholder Farm – Yebelo, southern Ethiopia

Introduction

This blog describes an approach being developed to deliver rapid classification of farmer strategies. The data comes from a survey conducted with two groups of smallholder farmers (see image 2), one group living in the Taita Hills area of southern Kenya and the other in Yebelo, southern Ethiopia. This work would not have been possible without the support of my supervisors James Hammond, from the International Livestock Research Institute (ILRI) (and developer of the Rural Household Multi Indicator Survey, RHoMIS, used in this research), as well as Andrew Dowsey, Levi Wolf and Kate Robson Brown from the University of Bristol.

Image 2: Measuring a Cows Heart Girth as Part of the Farm Surveys

Aims of the project

The goal of my PhD is to contribute a landscape approach to analysing agricultural systems. On-farm practices are an important part of an agricultural system and are one of the trilogy of components that make-up what Rizzo et al (2022) call ‘agricultural landscape dynamics’ – the other two components being Natural Resources and Landscape Patterns. To understand how a farm interacts with and responds to Natural Resources and Landscape Patterns it seems sensible to try and understand not just each farms inputs and outputs but its overall strategy and component practices. (more…)

Student Perspectives: An Introduction to QGAMs

A post by Ben Griffiths, PhD student on the Compass programme.

My area of research is studying Quantile Generalised Additive Models (QGAMs), with my main application lying in energy demand forecasting. In particular, my focus is on developing faster and more stable fitting methods and model selection techniques. This blog post aims to briefly explain what QGAMs are, how to fit them, and a short illustrative example applying these techniques to data on extreme rainfall in Switzerland. I am supervised by Matteo Fasiolo and my research is sponsored by Électricité de France (EDF).

Quantile Generalised Additive Models

QGAMs are essentially the result of combining quantile regression (QR; performing regression on a specific quantile of the response) with a generalised additive model (GAM; fitting a model assuming additive smooth effects). Here we are in the regression setting, so let F(y| \boldsymbol{x}) be the conditional c.d.f. of a response, y, given a p-dimensional vector of covariates, \boldsymbol{x}. In QR we model the \tauth quantile, that is, \mu_\tau(\boldsymbol{x}) = \inf \{y : F(y|\boldsymbol{x}) \geq \tau\}.

Examples of true quantiles of SHASH distribution.

This might be useful in cases where we do not need to model the full distribution of y| \boldsymbol{x} and only need one particular quantile of interest (for example urban planners might only be interested in estimates of extreme rainfall e.g. \tau = 0.95). It also allows us to make no assumptions about the underlying true distribution, instead we can model the distribution empirically using multiple quantiles.

We can define the \tauth quantile as the minimiser of expected loss

L(\mu| \boldsymbol{x}) = \mathbb{E} \left\{\rho_\tau (y - \mu)| \boldsymbol{x} \right \} = \int \rho_\tau(y - \mu) d F(y|\boldsymbol{x}),

w.r.t. \mu = \mu_\tau(\boldsymbol{x}), where

\rho_\tau (z) = (\tau - 1) z \boldsymbol{1}(z<0) + \tau z \boldsymbol{1}(z \geq 0),

is known as the pinball loss (Koenker, 2005).

Pinball loss for quantiles 0.5, 0.8, 0.95.

We can approximate the above expression empirically given a sample of size n, which gives the quantile estimator, \hat{\mu}_\tau(\boldsymbol{x}) = \boldsymbol{x}^\mathsf{T} \hat{\boldsymbol{\beta}} where

\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\arg \min} \frac{1}{n} \sum_{i=1}^n \rho_\tau \left\{y_i - \boldsymbol{x}_i^\mathsf{T} \boldsymbol{\beta}\right\},

where \boldsymbol{x}_i is the ith vector of covariates, and \boldsymbol{\beta} is vector of regression coefficients.

So far we have described QR, so to turn this into a QGAM we assume \mu_\tau(\boldsymbol{x}) has additive structure, that is, we can write the \tauth conditional quantile as

\mu_\tau(\boldsymbol{x}) = \sum_{j=1}^m f_j(\boldsymbol{x}),

where the m additive terms are defined in terms of basis functions (e.g. spline bases). A marginal smooth effect could be, for example

f_j(\boldsymbol{x}) = \sum_{k=1}^{r_j} \beta_{jk} b_{jk}(x_j),

where \beta_{jk} are unknown coefficients, b_{jk}(x_j) are known spline basis functions and r_j is the basis dimension.

Denote \boldsymbol{\mathrm{x}}_i the vector of basis functions evaluated at \boldsymbol{x}_i, then the n \times d design matrix \boldsymbol{\mathrm{X}} is defined as having ith row \boldsymbol{\mathrm{x}}_i, for i = 1, \dots, n, and d = r_1+\dots +r_m is the total basis dimension over all f_j. Now the quantile estimate is defined as \mu_\tau(\boldsymbol{x}_i) = \boldsymbol{\mathrm{x}}_i^\mathsf{T} \boldsymbol{\beta}. When estimating the regression coefficients, we put a ridge penalty on \boldsymbol{\beta}_{j} to control complexity of f_j, thus we seek to minimise the penalised pinball loss

V(\boldsymbol{\beta},\boldsymbol{\gamma},\sigma) = \sum_{i=1}^n \frac{1}{\sigma} \rho_\tau \left\{y_i - \mu(\boldsymbol{x}_i)\right\} + \frac{1}{2} \sum_{j=1}^m \gamma_j \boldsymbol{\beta}^\mathsf{T} \boldsymbol{\mathrm{S}}_j \boldsymbol{\beta},

where \boldsymbol{\gamma} = (\gamma_1,\dots,\gamma_m) is a vector of positive smoothing parameters, 1/\sigma>0 is the learning rate and the \boldsymbol{\mathrm{S}}_j‘s are positive semi-definite matrices which penalise the wiggliness of the corresponding effect f_j. Minimising V with respect to \boldsymbol{\beta} given fixed \sigma and \boldsymbol{\gamma} leads to the maximum a posteriori (MAP) estimator \hat{\boldsymbol{\beta}}.

There are a number of methods to tune the smoothing parameters and learning rate. The framework from Fasiolo et al. (2021) consists in:

  1. calibrating \sigma by Integrated Kullback–Leibler minimisation
  2. selecting \boldsymbol{\gamma}|\sigma by Laplace Approximate Marginal Loss minimisation
  3. estimating \boldsymbol{\beta}|\boldsymbol{\gamma},\sigma by minimising penalised Extended Log-F loss (note that this loss is simply a smoothed version of the pinball loss introduced above)

For more detail on what each of these steps means I refer the reader to Fasiolo et al. (2021). Clearly this three-layered nested optimisation can take a long time to converge, especially in cases where we have large datasets which is often the case for energy demand forecasting. So my project approach is to adapt this framework in order to make it less computationally expensive.

Application to Swiss Extreme Rainfall

Here I will briefly discuss one potential application of QGAMs, where we analyse a dataset consisting of observations of the most extreme 12 hourly total rainfall each year for 65 Swiss weather stations between 1981-2015. This data set can be found in the R package gamair and for model fitting I used the package mgcViz.

A basic QGAM for the 50% quantile (i.e. \tau = 0.5) can be fitted using the following formula

\mu_i = \beta + \psi(\mathrm{reg}_i) + f_1(\mathrm{nao}_i) + f_2(\mathrm{el}_i) + f_3(\mathrm{Y}_i) + f_4(\mathrm{E}_i,\mathrm{N}_i),

where \beta is the intercept term, \psi(\mathrm{reg}_i) is a parametric factor for climate region, f_1, \dots, f_4 are smooth effects, \mathrm{nao}_i is the Annual North Atlantic Oscillation index, \mathrm{el}_i is the metres above sea level, \mathrm{Y}_i is the year of observation, and \mathrm{E}_i and \mathrm{N}_i are the degrees east and north respectively.

After fitting in mgcViz, we can plot the smooth effects and see how these affect the extreme yearly rainfall in Switzerland.

Fitted smooth effects for North Atlantic Oscillation index, elevation, degrees east and north and year of observation.

From the plots observe the following; as we increase the NAO index we observe a somewhat oscillatory effect on extreme rainfall; when increasing elevation we see a steady increase in extreme rainfall before a sharp drop after an elevation of around 2500 metres; as years increase we see a relatively flat effect on extreme rainfall indicating the extreme rainfall patterns might not change much over time (hopefully the reader won’t regard this as evidence against climate change); and from the spatial plot we see that the south-east of Switzerland appears to be more prone to more heavy extreme rainfall.

We could also look into fitting a 3D spatio-temporal tensor product effect, using the following formula

\mu_i = \beta + \psi(\mathrm{reg}_i) + f_1(\mathrm{nao}_i) + f_2(\mathrm{el}_i) + t(\mathrm{E}_i,\mathrm{N}_i,\mathrm{Y}_i),

where t is the tensor product effect between \mathrm{E}_i, \mathrm{N}_i and \mathrm{Y}_i. We can examine the spatial effect on extreme rainfall over time by plotting the smooths.

3D spatio-temporal tensor smooths for years 1985, 1995, 2005 and 2015.

There does not seem to be a significant interaction between the location and year, since we see little change between the plots, except for perhaps a slight decrease in the south-east.

Finally, we can make the most of the QGAM framework by fitting multiple quantiles at once. Here we fit the first formula for quantiles \tau = 0.1, 0.2, \dots, 0.9, and we can examine the fitted smooths for each quantile on the spatial effect.

Spatial smooths for quantiles 0.1, 0.2, …, 0.9.

Interestingly the spatial effect is much stronger in higher quantiles than in the lower ones, where we see a relatively weak effect at the 0.1 quantile, and a very strong effect at the 0.9 quantile ranging between around -30 and +60.

The example discussed here is but one of many potential applications of QGAMs. As mentioned in the introduction, my research area is motivated by energy demand forecasting. My current/future research is focused on adapting the QGAM fitting framework to obtain faster fitting.

References

Fasiolo, M., S. N. Wood, M. Zaffran, R. Nedellec, and Y. Goude (2021). Fast calibrated additive quantile regression. Journal of the American Statistical Association 116 (535), 1402–1412.

Koenker, R. (2005). Quantile Regression. Cambridge University Press.

 

Applications now open for PhD in Computational Statistics and Data Science

Start your PhD in Data Science now

Compass CDT is now recruiting for its fully funded places to start September 2023.

We are happy to announce that The University of Bristol online application system is open, and we are receiving applications for Compass CDT programme for September 2023 start. Early application is advised.

For 2023/34 entry, applicants must review the projects on offer. The projects are listed in the research section of our website. You will need to provide a Research Statement in your application documents with a ranked list of 3 projects of interest to you: 1 being the project of highest interest.

PhD Project Allocation Process

Application forms will be reviewed based on the 3 ranked projects specified. Successful applicants will be invited to attend an interview with the Compass admissions tutors and the specific project supervisor. If you are made an offer of PhD study it will be published through the online application system. You will then have 2 weeks to consider the offer before deciding whether to accept or decline.

The next review of applications for 2023 funded places will take place after

4 January 2023.

APPLY NOW

We welcome applications from all members of our community and are particularly encouraging those from diverse groups, such as members of the LGBT+ and black, Asian and minority ethnic communities, to join us.

Advantages of being a Compass Student

  • Stipend – a generous stipend of £21,668 pa tax free, paid in monthly payments. Plus your own expense budget of £1,000 pa towards travel and research activity.
  • No fees – all tuition fees are covered by the EPSRC and University of Bristol.
  • Bespoke training – first year units are designed specifically for the academic needs of each Compass student, which enables students to develop knowledge and capability to pursue cross-disciplinary PhD research.
  • Supervisors – supervisors from across academic disciplines offer a range of research projects.
  • Cohort – Compass students benefit from dedicated offices and collaboration spaces, enabling strong cohort links and opportunities for shared learning and research.

About Compass CDT

A 4-year bespoke PhD training programme in the statistical and computational techniques of data science, with partners from across the University of Bristol, industry and government agencies.

The cross-disciplinary programme offers exciting collaborations across medicine, computer science, geography, economics, life and earth sciences, as well as with our external partners who range from government organisations such as the Office for National Statistics, NCSC and the AWE, to industrial partners such as LV, Improbable, IBM Research, EDF, and AstraZeneca.

Students are co-located with the Institute for Statistical Science in the School of Mathematics, which occupies the Fry Building.

Hear from our students about their experience with the programme

  • Compass has allowed me to advance my statistical knowledge and apply it to a range of exciting applied projects, as well as develop skills that I’m confident will be highly useful for a future career in data science. – Shannon, Cohort 2

  • With the Compass CDT I feel part of a friendly, interactive environment that is preparing me for whatever I move on to next, whether it be in Academia or Industry. – Sam, Cohort 2

  • An incredible opportunity to learn the ever-expanding field of data science, statistics and machine learning amongst amazing people. – Danny, Cohort 1

APPLY BEFORE: 

Wednesday 4 January 2023, 5pm (London, UK time zone)

APPLY NOW

Skip to toolbar