Student Perspectives: Spectral Clustering for Rapid Identification of Farm Strategies

Posted on 19th December 202212th January 2023 by dan.milner

A post by Dan Milner, PhD student on the Compass programme.

Image 1: Smallholder Farm – Yebelo, southern Ethiopia

Introduction

This blog describes an approach being developed to deliver rapid classification of farmer strategies. The data comes from a survey conducted with two groups of smallholder farmers (see image 2), one group living in the Taita Hills area of southern Kenya and the other in Yebelo, southern Ethiopia. This work would not have been possible without the support of my supervisors James Hammond, from the International Livestock Research Institute (ILRI) (and developer of the Rural Household Multi Indicator Survey, RHoMIS, used in this research), as well as Andrew Dowsey, Levi Wolf and Kate Robson Brown from the University of Bristol.

Image 2: Measuring a Cows Heart Girth as Part of the Farm Surveys

Aims of the project

The goal of my PhD is to contribute a landscape approach to analysing agricultural systems. On-farm practices are an important part of an agricultural system and are one of the trilogy of components that make-up what Rizzo et al (2022) call ‘agricultural landscape dynamics’ – the other two components being Natural Resources and Landscape Patterns. To understand how a farm interacts with and responds to Natural Resources and Landscape Patterns it seems sensible to try and understand not just each farms inputs and outputs but its overall strategy and component practices. (more…)

Student Perspectives: An Introduction to QGAMs

Posted on 16th November 2022 by ben.griffiths

A post by Ben Griffiths, PhD student on the Compass programme.

My area of research is studying Quantile Generalised Additive Models (QGAMs), with my main application lying in energy demand forecasting. In particular, my focus is on developing faster and more stable fitting methods and model selection techniques. This blog post aims to briefly explain what QGAMs are, how to fit them, and a short illustrative example applying these techniques to data on extreme rainfall in Switzerland. I am supervised by Matteo Fasiolo and my research is sponsored by Électricité de France (EDF).

Quantile Generalised Additive Models

QGAMs are essentially the result of combining quantile regression (QR; performing regression on a specific quantile of the response) with a generalised additive model (GAM; fitting a model assuming additive smooth effects). Here we are in the regression setting, so let $F(y| \boldsymbol{x})$ be the conditional c.d.f. of a response, $y$ , given a $p$ -dimensional vector of covariates, $\boldsymbol{x}$ . In QR we model the $\tau$ th quantile, that is, $\mu_\tau(\boldsymbol{x}) = \inf \{y : F(y|\boldsymbol{x}) \geq \tau\}$ .

Examples of true quantiles of SHASH distribution.

This might be useful in cases where we do not need to model the full distribution of $y| \boldsymbol{x}$ and only need one particular quantile of interest (for example urban planners might only be interested in estimates of extreme rainfall e.g. $\tau = 0.95$ ). It also allows us to make no assumptions about the underlying true distribution, instead we can model the distribution empirically using multiple quantiles.

We can define the $\tau$ th quantile as the minimiser of expected loss

$L(\mu| \boldsymbol{x}) = \mathbb{E} \left\{\rho_\tau (y - \mu)| \boldsymbol{x} \right \} = \int \rho_\tau(y - \mu) d F(y|\boldsymbol{x}),$

w.r.t. $\mu = \mu_\tau(\boldsymbol{x})$ , where

$\rho_\tau (z) = (\tau - 1) z \boldsymbol{1}(z<0) + \tau z \boldsymbol{1}(z \geq 0),$

is known as the pinball loss (Koenker, 2005).

Pinball loss for quantiles 0.5, 0.8, 0.95.

We can approximate the above expression empirically given a sample of size $n$ , which gives the quantile estimator, $\hat{\mu}_\tau(\boldsymbol{x}) = \boldsymbol{x}^\mathsf{T} \hat{\boldsymbol{\beta}}$ where

$\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\arg \min} \frac{1}{n} \sum_{i=1}^n \rho_\tau \left\{y_i - \boldsymbol{x}_i^\mathsf{T} \boldsymbol{\beta}\right\},$

where $\boldsymbol{x}_i$ is the $i$ th vector of covariates, and $\boldsymbol{\beta}$ is vector of regression coefficients.

So far we have described QR, so to turn this into a QGAM we assume $\mu_\tau(\boldsymbol{x})$ has additive structure, that is, we can write the $\tau$ th conditional quantile as

$\mu_\tau(\boldsymbol{x}) = \sum_{j=1}^m f_j(\boldsymbol{x}),$

where the $m$ additive terms are defined in terms of basis functions (e.g. spline bases). A marginal smooth effect could be, for example

$f_j(\boldsymbol{x}) = \sum_{k=1}^{r_j} \beta_{jk} b_{jk}(x_j),$

where $\beta_{jk}$ are unknown coefficients, $b_{jk}(x_j)$ are known spline basis functions and $r_j$ is the basis dimension.

Denote $\boldsymbol{\mathrm{x}}_i$ the vector of basis functions evaluated at $\boldsymbol{x}_i$ , then the $n \times d$ design matrix $\boldsymbol{\mathrm{X}}$ is defined as having $i$ th row $\boldsymbol{\mathrm{x}}_i$ , for $i = 1, \dots, n$ , and $d = r_1+\dots +r_m$ is the total basis dimension over all $f_j$ . Now the quantile estimate is defined as $\mu_\tau(\boldsymbol{x}_i) = \boldsymbol{\mathrm{x}}_i^\mathsf{T} \boldsymbol{\beta}$ . When estimating the regression coefficients, we put a ridge penalty on $\boldsymbol{\beta}_{j}$ to control complexity of $f_j$ , thus we seek to minimise the penalised pinball loss

$V(\boldsymbol{\beta},\boldsymbol{\gamma},\sigma) = \sum_{i=1}^n \frac{1}{\sigma} \rho_\tau \left\{y_i - \mu(\boldsymbol{x}_i)\right\} + \frac{1}{2} \sum_{j=1}^m \gamma_j \boldsymbol{\beta}^\mathsf{T} \boldsymbol{\mathrm{S}}_j \boldsymbol{\beta},$

where $\boldsymbol{\gamma} = (\gamma_1,\dots,\gamma_m)$ is a vector of positive smoothing parameters, $1/\sigma>0$ is the learning rate and the $\boldsymbol{\mathrm{S}}_j$ ‘s are positive semi-definite matrices which penalise the wiggliness of the corresponding effect $f_j$ . Minimising $V$ with respect to $\boldsymbol{\beta}$ given fixed $\sigma$ and $\boldsymbol{\gamma}$ leads to the maximum a posteriori (MAP) estimator $\hat{\boldsymbol{\beta}}$ .

There are a number of methods to tune the smoothing parameters and learning rate. The framework from Fasiolo et al. (2021) consists in:

calibrating $\sigma$ by Integrated Kullback–Leibler minimisation
selecting $\boldsymbol{\gamma}|\sigma$ by Laplace Approximate Marginal Loss minimisation
estimating $\boldsymbol{\beta}|\boldsymbol{\gamma},\sigma$ by minimising penalised Extended Log-F loss (note that this loss is simply a smoothed version of the pinball loss introduced above)

For more detail on what each of these steps means I refer the reader to Fasiolo et al. (2021). Clearly this three-layered nested optimisation can take a long time to converge, especially in cases where we have large datasets which is often the case for energy demand forecasting. So my project approach is to adapt this framework in order to make it less computationally expensive.

Application to Swiss Extreme Rainfall

Here I will briefly discuss one potential application of QGAMs, where we analyse a dataset consisting of observations of the most extreme 12 hourly total rainfall each year for 65 Swiss weather stations between 1981-2015. This data set can be found in the R package gamair and for model fitting I used the package mgcViz.

A basic QGAM for the 50% quantile (i.e. $\tau = 0.5$ ) can be fitted using the following formula

$\mu_i = \beta + \psi(\mathrm{reg}_i) + f_1(\mathrm{nao}_i) + f_2(\mathrm{el}_i) + f_3(\mathrm{Y}_i) + f_4(\mathrm{E}_i,\mathrm{N}_i),$

where $\beta$ is the intercept term, $\psi(\mathrm{reg}_i)$ is a parametric factor for climate region, $f_1, \dots, f_4$ are smooth effects, $\mathrm{nao}_i$ is the Annual North Atlantic Oscillation index, $\mathrm{el}_i$ is the metres above sea level, $\mathrm{Y}_i$ is the year of observation, and $\mathrm{E}_i$ and $\mathrm{N}_i$ are the degrees east and north respectively.

After fitting in mgcViz, we can plot the smooth effects and see how these affect the extreme yearly rainfall in Switzerland.

Fitted smooth effects for North Atlantic Oscillation index, elevation, degrees east and north and year of observation.

From the plots observe the following; as we increase the NAO index we observe a somewhat oscillatory effect on extreme rainfall; when increasing elevation we see a steady increase in extreme rainfall before a sharp drop after an elevation of around 2500 metres; as years increase we see a relatively flat effect on extreme rainfall indicating the extreme rainfall patterns might not change much over time (hopefully the reader won’t regard this as evidence against climate change); and from the spatial plot we see that the south-east of Switzerland appears to be more prone to more heavy extreme rainfall.

We could also look into fitting a 3D spatio-temporal tensor product effect, using the following formula

$\mu_i = \beta + \psi(\mathrm{reg}_i) + f_1(\mathrm{nao}_i) + f_2(\mathrm{el}_i) + t(\mathrm{E}_i,\mathrm{N}_i,\mathrm{Y}_i),$

where $t$ is the tensor product effect between $\mathrm{E}_i$ , $\mathrm{N}_i$ and $\mathrm{Y}_i$ . We can examine the spatial effect on extreme rainfall over time by plotting the smooths.

3D spatio-temporal tensor smooths for years 1985, 1995, 2005 and 2015.

There does not seem to be a significant interaction between the location and year, since we see little change between the plots, except for perhaps a slight decrease in the south-east.

Finally, we can make the most of the QGAM framework by fitting multiple quantiles at once. Here we fit the first formula for quantiles $\tau = 0.1, 0.2, \dots, 0.9$ , and we can examine the fitted smooths for each quantile on the spatial effect.

Spatial smooths for quantiles 0.1, 0.2, …, 0.9.

Interestingly the spatial effect is much stronger in higher quantiles than in the lower ones, where we see a relatively weak effect at the 0.1 quantile, and a very strong effect at the 0.9 quantile ranging between around -30 and +60.

The example discussed here is but one of many potential applications of QGAMs. As mentioned in the introduction, my research area is motivated by energy demand forecasting. My current/future research is focused on adapting the QGAM fitting framework to obtain faster fitting.

References

Fasiolo, M., S. N. Wood, M. Zaffran, R. Nedellec, and Y. Goude (2021). Fast calibrated additive quantile regression. Journal of the American Statistical Association 116 (535), 1402–1412.

Koenker, R. (2005). Quantile Regression. Cambridge University Press.

Applications now open for PhD in Computational Statistics and Data Science

Posted on 7th November 202219th December 2022 by crina.radu

Start your PhD in Data Science now

Compass CDT is now recruiting for its fully funded places to start September 2023.

We are happy to announce that The University of Bristol online application system is open, and we are receiving applications for Compass CDT programme for September 2023 start. Early application is advised.

For 2023/34 entry, applicants must review the projects on offer. The projects are listed in the research section of our website. You will need to provide a Research Statement in your application documents with a ranked list of 3 projects of interest to you: 1 being the project of highest interest.

PhD Project Allocation Process

Application forms will be reviewed based on the 3 ranked projects specified. Successful applicants will be invited to attend an interview with the Compass admissions tutors and the specific project supervisor. If you are made an offer of PhD study it will be published through the online application system. You will then have 2 weeks to consider the offer before deciding whether to accept or decline.

The next review of applications for 2023 funded places will take place after

4 January 2023.

APPLY NOW

We welcome applications from all members of our community and are particularly encouraging those from diverse groups, such as members of the LGBT+ and black, Asian and minority ethnic communities, to join us.

Advantages of being a Compass Student

Stipend – a generous stipend of £21,668 pa tax free, paid in monthly payments. Plus your own expense budget of £1,000 pa towards travel and research activity.
No fees – all tuition fees are covered by the EPSRC and University of Bristol.
Bespoke training – first year units are designed specifically for the academic needs of each Compass student, which enables students to develop knowledge and capability to pursue cross-disciplinary PhD research.
Supervisors – supervisors from across academic disciplines offer a range of research projects.
Cohort – Compass students benefit from dedicated offices and collaboration spaces, enabling strong cohort links and opportunities for shared learning and research.

About Compass CDT

A 4-year bespoke PhD training programme in the statistical and computational techniques of data science, with partners from across the University of Bristol, industry and government agencies.

The cross-disciplinary programme offers exciting collaborations across medicine, computer science, geography, economics, life and earth sciences, as well as with our external partners who range from government organisations such as the Office for National Statistics, NCSC and the AWE, to industrial partners such as LV, Improbable, IBM Research, EDF, and AstraZeneca.

Students are co-located with the Institute for Statistical Science in the School of Mathematics, which occupies the Fry Building.

Hear from our students about their experience with the programme

Compass has allowed me to advance my statistical knowledge and apply it to a range of exciting applied projects, as well as develop skills that I’m confident will be highly useful for a future career in data science. – Shannon, Cohort 2
With the Compass CDT I feel part of a friendly, interactive environment that is preparing me for whatever I move on to next, whether it be in Academia or Industry. – Sam, Cohort 2
An incredible opportunity to learn the ever-expanding field of data science, statistics and machine learning amongst amazing people. – Danny, Cohort 1

APPLY BEFORE:

Wednesday 4 January 2023, 5pm (London, UK time zone)

APPLY NOW

Compass Conference 2022

Posted on 18th August 202222nd November 2022 by crina.radu

Our first Compass Conference was held on Tuesday 13^th September 2022, hosted in the newly refurbished Fry Building, home to the School of Mathematics. (more…)

Access to Data Science Event: start your PhD journey with Compass

Posted on 22nd June 202222nd November 2022 by crina.radu

Want to find out what a modern PhD in Statistics and Data Science is like?

(more…)

Student perspectives: Neural Point Processes for Statistical Seismology

Posted on 20th June 202210th August 2022 by sam.stockman

A post by Sam Stockman, PhD student on the Compass programme.

Introduction

Throughout my PhD I aim to bridge a gap between advances made in the machine learning community and the age-old problem of earthquake forecasting. In this cross-disciplinary work with Max Werner from the School of Earth Sciences and Dan Lawson from the School of Mathematics, I hope to create more powerful, efficient and robust models for forecasting, that can make earthquake prone areas safer for their inhabitants.

For years seismologists have sought to model the structure and dynamics of the earth in order to make predictions about earthquakes. They have mapped out the structure of fault lines and conducted experiments in the lab where they submit rock to great amounts of force in order to simulate plate tectonics on a small scale. Yet when trying to forecast earthquakes on a short time scale (that’s hours and days, not tens of years), these models based on the knowledge of the underlying physics are regularly outperformed by models that are statistically motivated. In statistical seismology we seek to make predictions through looking at distributions of the times, locations and magnitudes of earthquakes and use them to forecast the future.

(more…)

Amazon Web Services: DataScience@Work seminar

Posted on 30th May 2022 by crina.radu

Amazon Web Services: DataScience@Work seminar

28th June 2022

16:00 - 17:00

Download iCal file for this event

Tag: University of Bristol

Student Perspectives: Spectral Clustering for Rapid Identification of Farm Strategies

Introduction

Aims of the project

Student Perspectives: An Introduction to QGAMs

Quantile Generalised Additive Models

Application to Swiss Extreme Rainfall

References

Applications now open for PhD in Computational Statistics and Data Science

Start your PhD in Data Science now

Compass CDT is now recruiting for its fully funded places to start September 2023.

PhD Project Allocation Process

Advantages of being a Compass Student

About Compass CDT

Hear from our students about their experience with the programme

Compass Conference 2022

Access to Data Science Event: start your PhD journey with Compass

Want to find out what a modern PhD in Statistics and Data Science is like?

Student perspectives: Neural Point Processes for Statistical Seismology

Introduction

Amazon Web Services: DataScience@Work seminar

Amazon Web Services: DataScience@Work seminar

Access to Data Science 2022

Access to Data Science 2022

Compass Away Day

Compass Away Day

Compass Mini-Projects Presentations

Compass Mini-Projects Presentations