Compass student and supervisors win ICML Outstanding Paper Award

Success at ICML 2025

A team of Compass CDT researchers – comprising PhD student Josh Givens, and his supervisors Dr Song Liu and Dr Henry Reeve – have won an Outstanding Paper Award at the International Conference on Machine Learning (ICML).

The prestigious award is presented to the highest quality accepted papers for one of the world’s top AI conferences. Of the 12,176 papers submitted this year, only six received this recognition.

This achievement further enhances what was already a successful conference season for Compass, with several of its students having papers accepted at ICML, including an oral presentation and spotlight paper, as well as at other leading UK and international events.

Groundbreaking research

The ICML award-winning team’s work on handling missing data in machine learning tasks was outlined in their paper, ‘Score Matching with Missing Data‘. Their groundbreaking research introduced two innovative methods of score matching — a key machine learning technique — to a setting where data are partially missing.

Score matching learns a ‘score model’ from data and underpins many modern AI applications, most notably generative models.

Image: Song et al., Score-Based Generative Modeling through Stochastic Differential Equations, 2021

Traditional score matching, however, requires fully observed data, which is often not realistic in applications when samples are corrupted. The Compass research team overcomes this issue by proposing ‘marginal score matching’, where the method actively ‘imputes’ data from observed samples when training the score model.

The research opens a wide range of opportunities for real-world applications, where the samples are partially missing, such as images collected from a noisy environment, or genomics samples with degraded DNAs/RNAs.

“This award is a fantastic recognition of the University of Bristol’s research strength in cutting-edge, foundational AI research on a competitive global stage,” said Dr Liu. “Our research enables machine learning methods to work reliably in the imperfect data settings we often face in the real world.”

Paper authors

  • Josh Givens (lead author) is a fourth-year Compass PhD student at the University of Bristol.
  • Dr Song Liu is Associate Professor in Data Sciences and AI at the University of Bristol.
  • Dr Henry Reeve is an Honorary Senior Lecturer at the University of Bristol, and Associate Professor in the School of AI, Nanjing University (China).

This post is adapted from an article, originally published on the University of Bristol’s School of Mathematics news site.

Student Perspectives: Tensors, Decomposed: Seeing Beyond Matrices



A post by Yuqi Zhang, PhD student on the Compass programme.

Introduction

Multiway data arise in many fields, from macroeconomics to neuroscience. In many modern applications, observations are naturally represented as higher-order arrays (tensors) rather than vectors or matrices.

While principal component analysis (PCA) and singular value decomposition (SVD) are powerful tools for analysing matrices, they are insufficient for such multi-dimensional data because they cannot capture interactions across multiple modes. A common but naive approach is to vectorise or matricise the data; however, this destroys the inherent array structure and ignores important dependencies present in the data.

Tensor methods are specifically designed to exploit and preserve the multi-dimensional relationships within data arrays. By working directly with tensors, we are able to uncover latent patterns that are not accessible via standard matrix-based techniques.

 

From SVD and PCA to Factor Models

Given observed data $\{X_t\}_{t=1}^T \subset \mathbb{R}^{p}$ (assumed mean-centred), principal component analysis (PCA) seeks the main directions of variation by studying the sample covariance matrix,
\[
\widehat{\Sigma} = \frac{1}{T}\sum_{t=1}^T X_tX_t^\intercal,
\]
and extracting its largest eigenvalues and eigenvectors. The SVD underpins this approach, but statistical interest focuses on modelling the population second moment,
\[
\Sigma = \mathbb{E}(X_tX_t^\intercal).
\]

The factor model assumes each observation is a linear combination of a small number of latent factors, plus idiosyncratic noise:
\[
X_t = \Lambda F_t + E_t,\quad \text{or }\; X_{it} = \sum_{j=1}^r \Lambda_{ij} F_{jt} + e_{it},
\]
where $\Lambda$ is the loading matrix, $F_t$ is the vector of latent factors, and $E_t$ is idiosyncratic noise. The covariance structure is then
\[
\Sigma = \Lambda \Sigma_F \Lambda^\intercal + \Sigma_E,
\]
where $\Sigma_F$ and $\Sigma_E$ are the covariances of the factors and the idiosyncratic components. Standard assumptions are: $r$ is small (so $\Sigma$ is approximately low rank); the loadings are pervasive ($\Lambda^\intercal \Lambda / p \to \Sigma_\Lambda$ is positive definite); and $E_t$ is uncorrelated with $F_t$, with $\Sigma_E$ usually diagonal or weakly dependent. Under these conditions, PCA can consistently estimate the factor structure.

However, classical vector or matrix factor models are only suited to data with two modes (e.g.\ variable $\times$ time or variable $\times$ individual). They cannot capture the higher-order dependencies and interactions present in modern multiway datasets, such as country $\times$ variable $\times$ time panels. This limitation motivates the development and use of \textit{tensor} methods, which explicitly retain and model the multi-dimensional array structure.

 

What is a Tensor?

Tensors extend familiar data structures such as vectors (order 1) and matrices (order 2) to arrays of three or more modes. For example, macroeconomic indicators measured across countries and variables over time form a third-order tensor, see the Figure below.

Figure 1. A real example of matrix-variate observations: macroeconomic variables for several countries over different quarters. Each slice is a country-variable matrix; stacking these over time yields a third-order tensor. Adapted from [4].
Tensors provide a natural framework for representing complex data in many fields, including colour images (height $\times$ width $\times$ channel), genomics (gene $\times$ patient $\times$ time), and recommender systems (user $\times$ item $\times$ context). Figure 2 shows this hierarchy.

Figure 2. Tensors generalise familiar objects: vectors (order 1), matrices (order 2), and higher-order arrays (order 3 or more). Adapted from [3].
Each entry in a tensor is indexed by one coordinate per mode, preserving multi-dimensional relationships. Figure 3 shows how each entry in a third-order tensor is specified by three indices.

Figure 3. A third-order tensor: (left) the overall cube, (middle) all its entries, and (right) how each entry is indexed by three coordinates $(i,j,k)$. Adapted from [3].
Flattening tensors into vectors or matrices is possible, but typically destroys the underlying structure and dependencies. Analysing tensors directly preserves multi-dimensional patterns.

 

Tensor decomposition

Tensor decompositions help represent high-dimensional data in a compact, interpretable form. Two of the most important decompositions are CP (CANDECOMP/PARAFAC) and Tucker decompositions.

CP (CANDECOMP/PARAFAC) decomposition. The CP decomposition expresses a tensor as a sum of rank-one components:
\[
\mathcal{X} \approx \sum_{j=1}^r \mathbf{a}_j^{(1)} \circ \mathbf{a}_j^{(2)} \circ \cdots \circ \mathbf{a}_j^{(N)},
\]
where each $\mathbf{a}_j^{(n)}$ is a vector in mode $n$, and $\circ$ is the outer product. CP decomposition gives a highly interpretable and often unique representation under mild conditions (see Figure 4).

Figure 4. CP (CANDECOMP/PARAFAC) decomposition of a third-order tensor: $\mathcal{X} \approx \sum_{j=1}^r \mathbf{p}_j^1 \circ \mathbf{p}_j^2 \circ \mathbf{p}_j^3$, where each term is a rank-one tensor formed by the outer product of vectors along each mode. Adapted from [1].

Tucker decomposition. Tucker decomposition generalises the matrix SVD to higher-order tensors by factorising the data into a core tensor and a set of factor matrices:
\[
\mathcal{X} \approx \mathcal{G} \times_1 A^{(1)} \times_2 A^{(2)} \cdots \times_N A^{(N)},
\]
where $\mathcal{G}$ is the core tensor and $A^{(n)}$ are the loading matrices for each mode. Tucker allows different numbers of components per mode and a flexible core tensor (see Figure 5).

Figure 5. Tucker decomposition for a third-order tensor $\mathcal{X}$: $\mathcal{C} \in \mathbb{R}^{m_1 \times m_2 \times m_3}$ is the core tensor, and $Q^{k} \in \mathbb{R}^{n_k \times m_k}$ ($k = 1, 2, 3$) are the loadings along each mode. Adapted from [1].
The CP decomposition is a special case of the Tucker decomposition, where the core tensor $\mathcal{G}$ is \textit{super-diagonal}\footnote{A super-diagonal tensor is nonzero only when all mode indices are equal, i.e., $g_{i_1,\dots,i_K} \neq 0$ only if $i_1 = \dots = i_K$.} and the number of components is the same along each mode. CP is unique and interpretable, but may be unstable for high-rank or noisy data. Tucker is more flexible, permitting different ranks for each mode and a full core tensor, which is useful for modelling real data.

Algorithms and extensions. Popular algorithms for tensor decompositions include alternating least squares (ALS), higher-order SVD (HOSVD), and structured variants for nonnegative, sparse, or smooth decompositions. Unlike the matrix SVD, tensor decompositions usually need iterative algorithms and are sensitive to initialisation and rank selection. For a full review, see [2].

 

Tensor Factor Models

The tensor factor model extends classical factor analysis to multiway data, building on the Tucker and CP decompositions. For observed data $\{\mathcal{X}_t\} \subset \mathbb{R}^{p_1 \times \ldots \times p_K}$, the model is
\[
\mathcal{X}_t = \mathcal{G}_t \times_1 \Lambda_1 \times_2 \Lambda_2 \cdots \times_K \Lambda_K + \mathcal{E}_t,
\]
where $\mathcal{G}_t$ is a low-dimensional core tensor, $\Lambda_k$ are mode-specific loading matrices, and $\mathcal{E}_t$ is idiosyncratic noise. This structure matches the Tucker decomposition, with CP recovered as a special case when the core tensor is super-diagonal.

This framework generalises the matrix factor model — where a matrix is expressed as low-rank signal plus noise — to higher-order arrays. Typical assumptions are: the signal has low multilinear rank (most variation is explained by a few factors per mode); each loading matrix is full rank and well-conditioned (ensuring identifiability and pervasiveness); and the noise tensor is mean zero, weakly dependent, and uncorrelated with the factors.

Tensor factor models are widely used for high-dimensional multiway data, such as neuroimaging, recommender systems, macroeconomic panels, and genomics. They capture shared patterns and interactions across multiple modes, offering more flexible and interpretable dimension reduction than matrix models. See [1,2] for reviews.

Their popularity comes from theoretical flexibility, practical effectiveness, and their close link to tensor decompositions, which allows use of established tools for statistical inference.

 

Applications and Outlook

Tensor methods are increasingly central to statistics, data science, and machine learning. In neuroscience, they reveal interpretable spatio-temporal patterns. In recommender systems, they capture user-item-context interactions. In macroeconomics, they help extract trends and shocks in high-dimensional panels. The link to latent factor models supports applications in clustering, anomaly detection, and other multivariate analyses.

Current research focuses on scalable algorithms, robust inference procedures, and methods that use domain constraints such as sparsity or smoothness. As multiway data become more common, tensor decompositions offer a principled and versatile toolkit for statistical modelling and analysis.

 

References

[1] Xuan Bi, Xiwei Tang, Yubai Yuan, Yanqing Zhang, and Annie Qu. Tensors in statistics. Annual review of statistics and its application, 8(1):345–368, 2021.

[2] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.

[3] Tatsuya Yokota. Very basics of tensors with graphical notations: unfolding, calculations, and decompositions. arXiv preprint arXiv:2411.16094, 2024.

[4] Long Yu, Yong He, Xinbing Kong, and Xinsheng Zhang. Projected estimation for large-dimensional matrix factor models. Journal of Econometrics, 229(1):201–217, 2022.

Student perspectives: Compass Away Day 2025

A post by Daniel Gardner, PhD student on the Compass programme.

With some of last year’s attendees having passed their vivas, this year’s Away Day brought remaining students to the lovely Aldwick Estate for activities and workshops. The former pig farm turned vineyard, conference and wedding venue, made a wonderful stage for the event.

Interviews and clustering

The day began (after a small adventure navigating our coach past some tractors and down country roads) with two post-graduation preparation activities. The first of these, run by Cecina Babich-Morrow, focused on mock interview questions that we may face in the real world.

Initially, they used their experience working in recruitment prior to their PhD to draft relevant questions for the rest of us, and then gave us all tips and tricks about what they and their company would have looked for in a response. Students moved around the room trialling these mock interviews in a quickfire ‘speed dating’ format.

Cecina’s activity was intended as the ‘industry’ section, and for those of us looking for academic applications, we turned to Edward Milsom. He was able to draw from his personal experience in academic interviews, and gave a similar talk to Cecina about what to look out for. Others in the room also contributed their own experiences and suggestions.

Students benefited by taking away first-hand advice from previous interviewers and interviewees, in industry and academia respectively. The activity provided an excellent format, through which we came face-to-face with realistic questions, and had the advantage of immediate feedback from colleagues – something not always granted in a real interview.

After this, we attempted a quick ‘clustering’ activity, with the aim of finding out which students shared the closest areas of research, grouping them using keywords from project abstracts.

Once ‘clustered’, we repeated the speed dating format, to not only learn about research similar to our own (you can think of this as the inter-cluster variance!) but also research as different to ours as possible (the intra-cluster variance respectively).

This section had to be cut a little short, so the full potential remains to be seen. What was clear, is that the diversity of research within the Compass CDT means that, for a given student, even fellow course-mates’ work may differ greatly to their own. We aimed to show this physically by using topic keyword clustering to seat like-minded students next to each other.

This activity provided students with opportunities to explore the value of collaboration, by learning who’s research was most similar to their own, but also by having opportunities to hear about the research furthest from their own area of focus.

Talks and activities

Here we took a break and had a fantastic lunch provided by the venue whilst exploring the gardens. In the afternoon, we were presented with talks by Ollie Baker and Kieran Morris, about the highs and lows of polynomials, and the mathematical ‘rules’ of urinals respectively.

Daniel Gardner then concluded the talks with a ‘year in review’ quiz, containing data science(ish) related questions about both famous events and viral moments of the past 12 months.

This took us to the two activities of the day, the first being archery: a popular event from last year’s Away Day. The competition was even more intense this time, with the blue team clinching the group victory, and after a tense face off between Dan Milner, Kieran Morris, and Ben Griffiths, Ben was crowned the ‘top gun’ of the day.

Afterwards, we were ready for the main attraction of Aldwick Estate: the vineyard tour and wine tasting, kindly funded by our industrial partners. The experience began with a very in depth talk on the kind of English wines the estate produces, and their exclusivity.

This was explained further as we toured the vast, sloping vineyards with our tour guide. Finally, we were able to literally get our noses stuck into to their signature white, rosé and red (or the Bacchus, Mary’s Rose, and Flying Pig). The day then ended with an excellent tapas dinner.

All in all we had a wonderful time learning, laughing, and enjoying the property, not just due to the wine consumed during the final couple of hours. Many thanks to the admin team for booking the venue and activities, and we all look forward to a similar experience next year.

Student Perspectives: Careers in Data Science

A post by Sherman Khoo, PhD student on the Compass programme.

Introduction

On 6 June 2025, the Compass CDT, in collaboration with the Jean Golding Institute, hosted ‘Careers in Data Science’ – an engaging Data Science@Work seminar as part of Bristol Data Week 2025. The event, held in the University of Bristol’s Merchant Venturers Building, offered participants valuable insights into the realities of a career in this important field.

Moderated by Professor Daniel Lawson, Compass Co-Director, and Helen Mawdsley, Industrial Partnerships Manager for Compass and the School of Mathematics, the event featured a panel of diverse data professionals – pictured from left to right below, as follows:

 

Panel Discussion

Elahe Naserian, a Senior Data Scientist, holds a Masters and a PhD in Computer Science, awarded by the University of Tehran and the University of the West of Scotland respectively.

She shared her interdisciplinary career path, through Postdoctoral Researcher and Research Fellow positions at the University of Exeter – where she applied data science and AI solutions to psychology and identity related challenges – culminating in her role at Trilateral Research, an organisation developing ethical AI solutions and addressing complex societal issues.

Elahe also highlighted key differences between novelty-focused, often unstructured, research environments in academia and the pragmatic, outcome-driven demands of industry.

Tom O’Connor, a Lead Data Scientist at the Ministry of Justice, graduated from Cardiff University with a BSc in Physics and an MSc in Data Science & Analytics – successfully moving between the two disciplines.

He discussed his experience of navigating roles in both the public and private sectors, and of transitioning from an individual contributor to a leadership role. Tom emphasised the importance of focusing on outcomes, rather than overly complex processes, and the challenge of translating technical methods into practical business applications.

Sarah Taylor-Knight, a Data Science Manager and University of Bristol alumna, graduating with an BEng in Engineering Mathematics. She provided valuable insights into managing a data science team at one of the largest general insurers in the UK, highlighting the enduring importance of fundamental skills, noting that maintaining and strengthening these core competencies is crucial at every stage of a data science career.

Her successful journey from internship and graduate roles, to senior data science positions at Allianz Personal, demonstrated the value of these opportunities.

Danny Williams is a Compass CDT alumnus, having been awarded his PhD last year, and now works as Machine Learning Engineer at the AI-focused start-up Weaviate, which has developed an open source vector database. He recounted his journey from theoretical academic research to a practical, applied industry role.

Danny encouraged the students in attendance to remain flexible and open to diverse opportunities in industry, rather than being constrained by academic expectations. Through discussing the way his current role involves coding projects, but also disseminating the latest AI developments, and getting involved in social media, he highlighted the diverse, and sometimes unexpected, opportunities that data science roles can present.

Panel Q&A

Following the panel discussion, the audience participated in an interactive Q&A session, which surfaced several recurring themes. A notable discussion centered on the contrasting nature of academia and industry, with a participant memorably likening the former to the passion-driven life of a musician, highlighting the dedication and artistry often found in academic research.

The panelists also underscored the importance of soft skills in industry roles, noting that collaboration, stakeholder management and communication becomes increasingly critical in the larger, diverse teams that can be typical of industry settings.

Another key theme addressed the relevance of a PhD for industry careers (especially important for potential PhD aspirants)! There was general agreement that, while specific PhD research may sometimes not directly transfer to industry applications, the skills developed – such as problem-solving, independent research and critical thinking – are often invaluable.

Overall, the event provided a comprehensive and insightful overview of data science careers, highlighting the diverse pathways and key considerations for aspiring data professionals navigating their journeys across both academia and industry.

Student Perspectives: Unconstrained Reparameterisation of Covariance Matrix Modelling via Modified Cholesky Decomposition

A post by Xinyue Guan, PhD student on the Compass programme.

Introduction

Effective management of renewable energy sources depends on accurate forecasts, particularly in a probabilistic format for decision-making under uncertainty. When dealing with multiple forecasts across different locations or times, it’s crucial to assess if errors will compound or offset each other. Let \boldsymbol{Y} = {y_1, y_2, \ldots, y_p} be a sequence of forecast errors with leading times l_1, l_2, \ldots, l_p, to understand these dependencies, we model the covariance matrix of the forecast errors.

One major difficulty in modelling the covariance matrix is guaranteeing the positive definiteness of the estimator, which is a matrix equivalent of positivity constraint for variance modelling. There are a few different unconstrained matrix reparameterisation available, such as Cholesky decomposition, spherical decomposition, matrix logarithm and etc. [2].

In this post, we will focus on the modified Cholesky decomposition and demonstrate how this decomposition allows us to reframe the complex problem of covariance modelling into a more intuitive series of linear regressions, revealing the underlying structure of forecast errors.

Modified Cholesky Decomposition

Instead of modelling the covariance matrix \Sigma directly, we model its inverse, the precision matrix \Sigma^{-1}. This allows us to reframe the complex problem of ensuring positive definiteness into a simpler, unconstrained estimation task.

The decomposition is expressed as:
\Sigma^{-1} = T^T D^{-2} T,

where T is a unit lower triangular matrix and D is a diagonal matrix.

A key advantage of the modified Cholesky decomposition is its direct and intuitive link to linear regression[3]. Consider the component of \boldsymbol{Y}, y_t, regressed on its predecessors y_1, \ldots, y_{t-1}:

y_t = \sum_{k=1}^{t-1} \phi_{tk} y_k + \epsilon_t, \quad t=1, \ldots, p.

The components of T and D have a direct statistical meaning in the context of this sequence of regressions:

1. The off-diagonal entries of the matrix T are the negative regression coefficients:

2. The diagonal entries are the variances of the residuals, \epsilon_t:
D^2=\text{diag}(\text{var}(\epsilon_1),\ldots,\text{var}(\epsilon_p))

By modelling the unconstrained entries of T and the logaritham of the diagonal entries of D^2 separately, the positive-definiteness of the final covariance matrix is automatically guaranteed. The logarithm transformation on entries of D simply ensures the positivity of these values.

Real-life Dataset and Sparsity of Modified Cholesky Factor

We analysed a wind power dataset from Scotland, which contains 351 daily observations. For each day, the data consists of 24 hourly forecast errors derived from a quantile regression model[1].

The figure below shows a heatmap of the sample covariance matrix for this dataset, visualizing the dependence structure between the hourly errors.

The left plot in figure above shows the matrix resulting from the modified Cholesky decomposition, computed by the mcd() function in the SCM R-package. In this matrix, the diagonal entries of \log(D^2) are combined with the off-diagonal entries of matrix T. To better visualize the structure of T, we can set the diagonal elements to zero as shown in the right plot, then we observe that only the entries on the first sub-diagonal are significantly different from zero.

The plot of diagonal entries of \log(D^2) in the figure above shows that the first entry is distinctly different from the others. This observation is directly explained by the linear regression interpretation of the modified Cholesky decomposition. The first entry, \log(d_1^2), represents the logarithm of the unconditional variance of the first variable y_1. In contrast, all subsequent entries, \log(d_h^2) for h>1, represent the logarithm of the conditional variances of the regression errors \epsilon_h, after accounting for all preceding variables y_1,\dots,y_{h-1}. Therefore, it is expected that the first term, which carries the full variance of the initial variable, would differ from the subsequent residual variances.

The figure above plots the last row of the matrix T, clearly shows that only the values closest to the main diagonal have significant non-zero values, with the remaining entries oscillate around zero. This observation aligns with linear regression interpretations. In a high-order autoregression, it is expected that the coefficients for the most recent variables will be the most significant, while the influence of distant past variables diminishes.

This emergent sparsity is a highly desirable feature which allows our model to be parsimonious, meaning it has far fewer parameters to estimate. This significantly reduces computational cost and, more importantly, lowers the risk of over-fitting the model, leading to better predictive performance.

Conclusion

The modified Cholesky decomposition is a powerful tool for covariance modelling for two key reasons. It not only solves the technical challenge of ensuring positive definiteness but, more importantly, its inherent link to linear regression provides a clear, interpretable insight into the data’s underlying structure.

Our analysis used this property to reveal a sparse, autoregressive dependency within wind power forecast errors, by uncovering this simple structure, the decomposition allows for the development of more parsimonious and robust statistical models that are easier to estimate and less prone to over-fitting.

References

[1] Gilbert, C. (2021). Topics in High-dimensional Energy Forecasting. Ph.d. dissertation, University of Strathclyde, Glasgow.
[2] Pinheiro, J. and Bates, D. (1996). Unconstrained parametrizations for variance-covariance matrices. Statistics and Computing, 6:289–296.
[3] Pourahmadi, M. (2011). Covariance Estimation: The GLM and Regularization Perspectives. Statistical Science, 26(3):369 – 387.

Compass students at ICML 2025

The Compass CDT will be well-represented at this year’s International Conference on Machine Learning (ICML), with several of its students having papers accepted by the prestigious event.

One of these has been selected for oral presentation and one as a ‘spotlight position paper’, while another was co-authored by more than one Compass student – indicative of the strength and depth of the Centre’s machine learning and AI research:

This strong presence at ICML, which takes place in Vancouver next week, forms part of a packed conference summer, with Compass students presenting at other important UK and international events, including UAI 2025 in Brazil, and the ERSA Congress in Greece. (See full list).

ICML 2025 will take place in Vancouver, Canada from 13 to 19 July 2025.

“It’s fantastic to see our students making valuable contributions to one of the leading conferences in this field, and especially rewarding to have an oral presentation and spotlight paper as part of that,” said Professor Nick Whiteley, Director of the Compass CDT.

“Alongside NeurIPS and ICLR, both of which Compass students and alumni have contributed to in recent years, ICML provides an opportunity to showcase our work to machine learning researchers and practitioners from across academia and industry.

We are proud of the cutting-edge projects being undertaken by students and colleagues in the Centre, and across the wider Institute for Statistical Science here in the School of Mathematics, which span data science, statistics, machine learning and AI.

We’ve had some positive outcomes already this year, with our first Compass graduates finding employment with a range of organisations in the private and public sectors, as well as in academia, and current students securing some exciting internships.

Having papers accepted at high impact conferences, including ICML, and being involved in well-regarded events at home and overseas, is a fantastic way to round off the academic year.”

With Compass student Sherman Khoo winning an Early Career Researcher Poster Award in June, at Bayes Comp 2025 – a meeting of the Bayesian Computation Section of the International Society for Bayesian Analysis –  the conference season is already off to a successful start.

Compass CDT student Sherman Khoo (furthest left) being announced as a winner of an Early Career Researcher Poster Award at Bayes Comp 2025, held in Singapore in June.

The fact that some papers accepted at conferences this year, including one at ICML, were co-authored by more than one Compass student has been particularly rewarding, as one of the Centre’s aims is to foster collaboration, and to build a community of researchers.

Emerald Dilworth will present a poster at Uncertainty in Artificial Intelligence (UAI), summarising a paper for which she was lead author, and Compass graduate Ed Davis was one of the co-authors, as was Compass Co-director Professor Daniel Lawson. “I reached out to Ed with my initial ideas, and we grew them together,” says Emerald.

“Some of the work belongs to each of us as parts we contributed individually, and other parts we worked on together, which is something we enjoyed. It’s a valuable, positive experience to collaborate on research, when a lot of what we do as PhD students can be solo work.”

This collaborative culture, alongside awards, presentations, spotlight slots, and a strong presence at ICML and other events, makes for a positive 2025 conference season, and suggests this trend will continue in future years, as students and alumni continue to gain experience.

“Travelling to Vancouver in July, and representing the University of Bristol, and the UK research community, at a forum as important as ICML, is a fantastic opportunity,” says Professor Whiteley. “The insights students gain will help to shape their future research.

The fact that students from both the 2021 and 2022 Compass cohorts will contribute to that conference, while all cohorts will be represented at a diverse range of leading events this summer, shows the breadth of Compass expertise. The research our students showcase, the knowledge they gain, and the networks they build, will have benefits for a long time to come.”

Compass student Cecina Babich Morrow (third from right) was a panellist at a GW4 AI and Data Science event focused on climate and health, held in Bristol in June.

 

Compass at ICML 2025

July 2025 – Vancouver, Canada

Student: Josh Givens (lead author)
Co-authors: Song Liu; Henry W J Reeve
Contribution: Oral presentation
Overview: ‘Score Matching with Missing Data’. Score matching is a technique used to learn intractable data distributions and serves as a foundational component of state-of-the-art image generation methods. In this paper, we adapt score matching (and its various extensions) to enable the learning of the original data distribution using only corrupted samples, where parts of each observation are missing or contain NaN values.

Student: Sam Bowyer (lead author)
Co-authors: Laurence Aitchison; Desi R. Ivanova
Contribution: Spotlight Position Paper
Overview: Position: Don’t Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints’. We argue that AI researchers need to improve the techniques they use to calculate error bars (i.e. confidence/credible intervals) over model-performance on benchmark test-sets (‘evals’). We examine the failure of common approaches that rely on the Central Limit Theorem (CLT) and suggest Bayesian and frequentist alternatives in a variety of eval settings, such as questions organised in subtasks/clusters, and comparisons between two models.

Student: Ed Milsom (lead author) and Ben Anson (co-author)
Co-author: Laurence Aitchison
Contribution: Poster
Overview: Function-Space Learning Rates’. This will help AI researchers better understand why current AI systems work well and how to improve them in the future. We utilise our method to tune very large AI systems by tuning a small, cheap model and then copying the settings to the large model (which would be very computationally expensive to tune by itself). This is not usually possible, because the optimal settings change between small and large AI systems, but with our method, we can predict and therefore correct these changes.

Student: Tennessee Hickling (lead author)
Co-authors Dennis Prangle
Contribution: Poster
Overview: Flexible Tails for Normalizing Flows’. Modern machine learning methods model uncertainty by transforming simple random inputs into data-like outputs, but they often underestimate extreme events. While some recent approaches inject extreme inputs directly, this can cause models to behave poorly. We propose a different solution: adding a final step that enables models to generate extreme outcomes from non-extreme inputs, improving performance on data with heavy tails.

‘Paired question model comparison setting’ – from Sam Bowyer’s ICML Spotlight Position Paper.

 

Compass at UAI 2025

July 2025 – Rio de Janeiro, Brazil

Student: Emerald Dilworth (lead author) and Ed Davis (co-author)
Co-authors: Daniel Lawson (Compass Co-director)
Contribution: Poster
Overview:Valid Bootstraps for Network Embeddings with Applications to Network Visualisation’. Quantifying uncertainty in networks is an important step in modelling relationships and interactions between entities. Under certain assumptions of the network, we utilise an exchangeable network test that can empirically validate bootstrap samples generated by any method, by considering if embeddings of the observed and bootstrapped network that are statistically indistinguishable. Existing methods fail this test, so we propose a principled, distribution-free network bootstrap using k-nearest neighbour smoothing, which can pass this exchangeable network test in many synthetic and real-data scenarios. We demonstrate the utility of this work in combination with the popular data visualisation method t-SNE, where uncertainty estimates from bootstrapping are used to explain whether visible structures represent real statistically sound structures.

Student: Emma Ceccherini (lead author)
Co-authors: Ian Gallagher; Andrew Jones; Daniel Lawson (Compass Co-director)
Contribution: Poster
Overview:Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees’. Most AI/ML methods have exchangeability or stability properties, loosely speaking, for the same input they produce the same output up to noise, however, this is not always true for dynamic graph embedding. We propose a dynamic attributed embedding that collectively embeds attributes and network information in the same low-dimensional space, and we prove uniform convergence, which establishes stability. As a result, our method performs better than competitors on downstream tasks.

A simulated results figure from Emerald Dilworth’s paper, co-authored with Ed Davis.

 

Compass at GOFCP 2025

(Goodness-of-fit, Change-point and Related Problems)

August 2025 – Charles University, Prague, Czech Republic

Student: Dylan Dijk
Contribution: Poster
Overview: This poster presents work on generalising the assumptions behind a widely used econometric model (the model is partly described in Dylan’s blog from July 2024). In real-world applications – especially in finance – large datasets often contain extreme values, particularly during periods of crisis. The goal is to adapt the model’s theoretical foundations to ensure reliable performance even when such heavy-tailed data are present.

Student: Yuqi Zhang
Contribution: Poster
Overview: Presents a method developed jointly with Dr Haeran Cho (Compass Projects Coordinator). We introduce a multiscale, bandwidth-free procedure for detecting multiple change points in large approximate factor models. The method combines the Narrowest-over-Threshold (NOT) principle with Seeded Binary Segmentation (SBS) to efficiently identify structural changes in high-dimensional time series without requiring prior bandwidth selection. The poster presents a method developed jointly with Dr Haeran Cho, designed for multiple change points in high-dimensional factor models.

 

Compass contributions to other events:

Random Networks Workshop
Date/location: May 2025 – University of Sheffield, UK
Student: Ollie Baker (lead author)
Co-authors: Carl P. Dettmann
Contribution: Contributed talk
Overview:Entropy of Random Geometric Graphs in High and Low Dimensions’. The paper uses information theory to discuss whether or not we can detect the geometric embedding of a spatial network when the dimension of the embedding gets very large. We then use this to derive a bound on the amount of information saved by embedding a network in a lower dimensional space.

Bayes Comp 2025
Date/location: June 2025 – National University of Singapore
Student: Sherman Khoo (lead author)
Contribution: Poster
Overview: ‘Approximate Maximum Likelihood Estimation with Local Score Matching’. We study the problem of likelihood maximization when the likelihood function is intractable, but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from a localized region around each parameter iterate.

UK AI Conference 2025
Date/location: June 2025 – London, UK
Student: Rachel Wood (lead author)
Contribution: Poster
Overview: We first consider how to define an anomaly in a network, then explore using existing reliable and efficient embedding methods for this purpose. Most methods are only present for the correct (unknown) latent dimension $d$ and performance may deteriorate for other choices of $d$, so we propose an approach which retains accuracy when the dimension choice is mis-specified, offering a more robust solution for anomaly detection in dynamic and uncertain network environments.

GW4 AI and Data Science: AI, Climate and Health
Date/location: June 2025 – University of Bristol, UK
Student: Cecina Babich Morrow
Contribution: Lightning talk, panel member and poster
Overview: ‘From risk to action: Climate decision-making under deep uncertainty’. How can we make robust climate adaptation decisions despite our uncertainty about both the level of climate risk and the characteristics of our adaptations? I showed how uncertainty and sensitivity analysis might be helpful in addressing these issues.

46th Annual Conference of the International Society for Clinical Biostatistics (ISCB)
Date/location: August 2025 – University of Basel, Switzerland
Student: Vera Hudak (lead author)
Co-authors: Hayley Jones; Nicky J. Welton; Efthymia Derezea
Contribution: Poster
Overview: ‘Evaluating Diagnostic Tests Against Composite Reference Standards: Quantifying and Adjusting for Bias’. Our research focuses on diagnostic test accuracy studies where gold standard testing is only performed on a subset of study participants, specifically those with certain results from some initial imperfect reference test. We have quantified the bias that can arise in these scenarios and proposed a method to adjust for it, which was evaluated using a simulation study.

64th European Regional Science Association (ERSA) Congress: ‘Regional Science in Turbulent Times. In search of a resilient, sustainable and inclusive future’
Date/location: August 2025 – Panteion University, Athens, Greece
Student: Emerald Dilworth (presenter)
Co-authors: Emmanouil TranosDaniel Lawson (Compass Co-director)
Contribution: Special session presentation
Overview: ‘The Twin Transition in the UK through LLM labelled Web Crawl Data’. Many existing tools for tracking green and digital industries are expensive, limited in scale, or updated too infrequently, making it difficult to design effective policies for technological progress. To address this, we use freely available web data to identify UK firms involved in green and digital technologies by fine-tuning a language model to classify company websites. This allows us to track how these industries have evolved across regions and over time from 2014 to 2024, at a yearly time scale.

Global Optimization Workshop 2025
Date/location: September 2025 – KTH Royal Institute of Technology, Stockholm, Sweden
Student: Ettore Fincato (co-author)
Co-authors: Christophe Andrieu; Nicolas Chopin; Mathieu Gerber
Contribution: Paper presentation
Overview: Gradient-free optimization via integration’. We develop and analyse a novel approach to optimize functions not assumed to be convex, differentiable or even continuous. The idea is to fit recursively the objective function to a parametric family of distributions, using a Bayesian update followed by a reprojection back onto the chosen family. Practically, the approach enables the optimization of a broad class of objectives via Monte Carlo sampling; theoretically, it establishes a link with gradient-based methods with smoothing.

Exact entropy curves from Ollie Baker’s paper, presented at the Random Networks Workshop.

 

Student Perspectives: Verification Bias in Diagnostic Test Accuracy Studies with Conditional Reference Standards

A post by Vera Hudak, PhD student on the Compass programme.

Introduction

To evaluate the accuracy of a new diagnostic test (the `index test’), the ideal approach is to compare it against an error-free reference standard, known as the gold standard.
However, gold standards may be unavailable, invasive, or costly. In such situations, a possible approach is to condition gold standard testing on the outcome of some initial imperfect reference standard test(s).

We focus on a conditional testing design which we refer to as `check the negatives’. Here, all participants receive the index test (Test A) and an imperfect reference standard (Test B), then those testing negative on Test B are followed up with the gold standard (GS). The diagnostic accuracy of Test A is assessed against observed disease status, derived from the test sequence combining Test B and the GS. Figure 1 illustrates this design.

Figure 1: Test sequence for `check the negatives’.

Now, if Test B was 100% specific, the `check the negatives’ design would lead to unbiased estimates of the sensitivity and specificity of Test A, given by:

\text{Sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}} \quad \text{and} \quad \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}.

However, as Test B is an imperfect test, this is unlikely to be the case, and bias can be anticipated. We quantified this bias and proposed a Bayesian adjustment method.

Probabilities in `Check the Negatives’ Studies

Let A_{se} and A_{sp} denote the true sensitivity and specificity of Test A, B_{sp} the true specificity of Test B, and \pi the prevalence of the condition in the study population. We also introduce c_0 as the covariance of errors between Test A and Test B in the disease-free population.

In a `check the negatives’ design, we do not observe the complete set of outcomes from the index test, the imperfect reference standard, and the gold standard. Instead, we only observe a reduced set of results: each participant’s outcome on Test A and their observed disease status, as determined by the conditional diagnostic test sequence combining Test B and the GS. Table 1 shows the probabilities associated with these observed outcomes under conditional dependence between Test A and Test B [1]. The corresponding probabilities under conditional independence can be obtained by setting c_0 = 0.

Table 1: Probabilities observed in a `check the negatives’ study.

Quantifying Bias

We used the probabilities from Table \ref{tab:partially ver prob} to find closed-form expressions for the naive estimates of the sensitivity (\widehat{A_{se}}) and specificity (\widehat{A_{sp}}) of Test A, and hence, the bias. The bias in the naive estimate of specificity is as follows:

\text{Bias}(\widehat{A_{sp}}) = \frac{c_0}{B_{sp}}.

Under conditional independence (c_0 = 0), the naive specificity estimate is unbiased. If, however, Tests A and B are conditionally dependent among the disease-free population, then there is a bias which depends on B_{sp} and c_0.

Similarly, the bias in the naive estimate of the sensitivity of Test A can be expressed as:

\text{Bias}(\widehat{A_{se}}) = \frac{(1-\pi)((1-A_{sp}) (1-B_{sp}) + c_0 - A_{se}(1-B_{sp}))}{(1-\pi)(1-B_{sp}) + \pi},

Assuming independence in the disease-free population (c_0 = 0), Figure 2 shows \text{Bias}(\widehat{A_{se}}) as a function of B_{sp}, for selected values of \pi, A_{se}, A_{sp} and B_{sp}.

Figure 2: Bias in the naive estimate of the sensitivity of Test A against the specificity of Test B for different values of Test A sensitivity, specificity and disease prevalence.

As expected from the study design, bias tends to 0 as B_{sp} tends to 1. Bias increases as the accuracy of Test A improves, i.e. when A_{se} or A_{sp} is larger, or with lower prevalence. We can see that when \pi = 0.9, the bias is almost negligible. However, bias can be substantial in some scenarios, specifically under low prevalence, even when Test B has high specificity (e.g. over 95%).

Bias Adjustment

We proposed a Bayesian model with an informative prior on Test B specificity can be used for adjusting for the bias in the naive estimate of the sensitivity of Test A, under conditional independence (c_0 = 0).

We let \boldsymbol{x} = (TP, FN, FP, TN) be the data reported by a `check the negatives’ study evaluating Test A. Then \boldsymbol{x} \sim \text{Multinomial}(\boldsymbol{p}, n), where n is the number of participants in the study, and the probabilities \boldsymbol{p} = (p_1, p_2, p_3, p_4) are as specified in Table 1, with c_0 = 0.

Suppose we have prior information about the specificity of Test B represented with a Beta prior distribution. Then a bias-adjusted estimate of A_{se} could be obtained by fitting this multinomial distribution to the data with vague \text{Beta}(1,1) priors for the remaining three parameters, A_{se}, A_{sp}, and \pi.

Simulation Study

We assessed this adjustment method through a simulation study under two scenarios: where the informative prior is correctly centred on the true specificity, and where it underestimates the truth by 5%, to examine the impact of moderate prior misspecification. Prior precision (sd) is also varied to assess the impact of increasing uncertainty. In Figure 3, we present some result from this simulation study for the correctly centred prior case. These results show that a correctly centred prior consistently eliminates bias under high precision and reduces it under lower precision.

Figure 3: Crude and adjusted bias in the sensitivity of Test A using correctly centred priors, for some parameter combinations.

Although not shown here, we found that overly pessimistic priors can over-correct, increasing absolute bias, especially when initial bias is small. This risk is mitigated when the informative prior is less precise. We are currently writing up this simulation study as a paper to be submitted for publication soon.

Future Work

Further work could be done to explore adjustment under conditional dependence between tests, or situations in which the third test in the sequence, here the GS, is imperfect.

References

[1] Pamela M. Vacek. The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics, 41:959, 12 1985.

Student Perspectives: The Distribution of High-Dimensional Random Geometric Graphs



A post by Ollie Baker, PhD student on the Compass programme.

Introduction

A random geometric graph is a model of a network with a geometric embedding, or a latent space embedding. They were originally introduced by Gilvert in 1961, who called them ‘random planar networks’ [3]. If we are studying a high dimensional dataset, then we may be interested in modelling it using some form of network structure which contains information about similarity of data points in the latent space. This can be represented as points in a high-dimensional space connected based on their closeness according to some distance metric. What can we say about the distribution of these random graph models when the dimension of the space gets large? The answer to this question has applications in clustering, and graph compression once we consider the information theory of these network ensembles.

Figure 1: An example of a 500 node Random Geometric Graph on the unit square with connection range 0.1.

Let $(\Omega, \rho)$ be a metric space, where $\Omega \subset \mathbb{R}^d$ is compact. In this post, we will only consider $\Omega = [0,1]^d$ with $\rho = \|\cdot\|$ the Euclidean metric, and $\Omega = [0,1]^d$ with $\rho = \rho_t$, with $\rho_t(x,y) = \left(\sum_{i=1}^d\min(|x-y|, 1-|x-y|)^2 \right)^{\frac{1}{2}}$. This second metric space is known as the unit $d$-torus, and is denoted $\mathbb{T}^d$. To form a Random Geometric Graph (RGG) $G$, we distribute $n$ points at random according to a probability density $\nu$ in $\Omega$ to form the vertex set, and connect nodes $i$, $j$ if they have a mutual distance less than $r_0$. That is, if $X_1,…,X_n$ are our points in $\Omega$, $i\sim j$ in $G$ if and only if $\rho(X_i,X_j) \leq r_0$. Figure 1 shows an example realisation of such a graph. We are interested in densities of the form:

\begin{equation}
\label{general_distribution}
\nu(\{x_i\}_{i=1}^d) = \prod_{i=1}^d \pi(x_i)
\end{equation}
That is, the coordinates of a point $x$ are i.i.d. \\ \newline
We are interested in computing the probability of a graph $G$ with adjacency matrix $A=\{a_{ij}\}_{i,j=1}^n$:
\begin{equation}
\mathbb{P}(G) = \int_{[0,D]^{\binom{n}{2}}} f_{\vec{R}}(\vec{r})\prod_{i<j} \left(a_{ij}\mathbb{I}(r_{ij}<r_0)+(1-a_{ij})\mathbb{I}(r>r_0)\right)d\vec{r}
\end{equation}
where $D$ is the diameter of (largest possible distance between two points in) $\Omega$, $f_{\vec{R}}$ is the joint density of pair distances in $\Omega$ (which is well defined if $d>n+2$), and $d\vec{r}=\prod_{i<j}dr_{ij}$. Clearly, this is intractable for most choices of $\Omega$ and $n$. However, we can simplify things if we take the limit $d\rightarrow\infty$. An \textit{ensemble} $\mathcal{G}$ of RGGs is the set of all possible RGGs that can be constructed with a fixed $r_0, n, \Omega, \nu$ and $\rho$. A sequence of ensembles $\{\mathcal{G}_d\}$ (with all parameters except $n$ dependent on $d$) equipped with probability measures $\mathbb{P}_d$ \textit{converges in distribution} to another ensemble $\mathcal{G}$ as $d\rightarrow\infty$ if
\begin{equation}
\mathbb{P}_d(g) \rightarrow \mathbb{P}(g)
\end{equation}
for all graphs $g$ with $n$ nodes.

Central Limit Theorem for High-Dimensional Distance

We can use a central limit theorem (CLT) to prove that the vector of pair distances in $\Omega$ converges in distribution to a multivariate Gaussian as $d\rightarrow\infty$, which is a generalisation of what is done in [2].

Theorem 1:

Let $(\Omega, \rho)$ be a metric space, and $\nu$ be a node density as descibed above, and $X_1,…,X_n$ be random vectors distributed according to $\nu$. Define $R_{ij}^{(k)} = \rho(X_i^{(k)}, X_j^{(k)})^2$, the distance between $X_i$ and $X_j$ in each coordinate, and $\mu := \mathbb{E}[R_{12}^{(1)}]$. Let
\begin{equation}
q_{ij} := \frac{1}{\sqrt{d}}\sum_{k=1}^d (R_{ij}^{(k)}-\mu)
\end{equation}
Then, as $d\rightarrow\infty$ the vector $\vec{q} = \{q_{ij}\}_{1\leq i<j\leq n} \in \mathbb{R}^{\binom{n}{2}}$ satisfies
\begin{equation}
\vec{q} \rightarrow Z \sim N(0_{\binom{n}{2}},\Sigma)
\end{equation}
where $\Sigma$ is the covariance matrix indexed by multi-indexes $(i,j)$ given by $\Sigma_{(i,j),(i,j)} = \alpha := \mathbb{E}[(\rho(X_i^{(1)},X_j^{(1)})^2-\mu^2)^2]$, and $\Sigma_{(i,j),(j,k)} = \beta := \mathbb{E}[(\rho(X_i^{(1)},X_j^{(1)})^2-\mu^2)(\rho(X_j^{(1)},X_k^{(1)})^2-\mu^2)]$, and $\Sigma_{(i,j),(k,l)} = 0$.

Essentially, this means that when $d\rightarrow$, we can replace the intractable joint density $f_{\vec{R}}$ with a multivariate Gaussian density, which will make our calculations much easier.

Erdös-Rényi Random Graphs

The Erdös-Rényi (ER) random graph, also denoted as $G(n,p)$ is arguably the simplest model of a random graph. We take $n$ nodes, and connect each pair with fixed probability $p$. The probability of an ER graph $G$ is given by the binomial probability
\begin{equation}
\mathbb{P}(G) = p^{k}(1-p)^{\binom{n}{2}-k}
\end{equation}
where $k = \sum_{i<j} a_{ij}$ is the number of edges in $g$. Clearly, this model is a non-spatial network model, which is not good for modelling networks with a latent space structure! Therefore, we would like to guarantee that our model does not converge to $G(n,p)$ as $d\rightarrow\infty$, otherwise we would lose information about the spatial or latent space correlation of our data.

When Does the RGG Converge to $G(n,p)$?

For the main results, we will provide a condition on the distribution of nodes in $\Omega$ for when we see convergence in distribution to $G(n,p)$. For a RGG with connection range $r_0$ in the metric space $(\Omega, \rho)$ and $\mu = \mathbb{E}[\rho(X_i^{(1)},X_j^{(1)})^2]$, define the \textit{normalised connection range} $t = \frac{r_0^2}{\sqrt{d}} – \mu \sqrt{d}$. Recall that the probability of a graph is given by
\begin{equation}
\mathbb{P}(G) = \int_{[0,D]^{\binom{n}{2}}} f_{\vec{R}}(\vec{r})\prod_{i<j} \left(a_{ij}\mathbb{I}(r_{ij}<r_0)+(1-a_{ij})\mathbb{I}(r>r_0)\right)d\vec{r}
\end{equation}
If the random distances converge to a Gaussian, then we have (after some algebra)
\begin{equation}
\mathbb{P}(G) \rightarrow \int_{\mathcal{A}} N(0,\Sigma)(\vec{q})d\vec{q}
\end{equation}
where $\mathcal{A} = \bigotimes_{i<j} A_{ij}$ where $\bigotimes$ denotes the Cartesian product of sets, and $A_{ij}$ is the set:
\begin{equation}
A_{ij} := \begin{cases}
(-\infty, t] & a_{ij} =1 \\
(t,\infty) & a_{ij} = 0
\end{cases}
\end{equation}
If $\Sigma$ is diagonal, then the integral above splits up into its marginals, and
\begin{equation}
\mathbb{P}(G) = \prod_{i<j} \bar{p}(t)^k(1-\bar{p}(t))^{\binom{n}{2}-k}
\end{equation}
with $\bar{p}(t) := \int_{-\infty}^t N(0, \alpha)(q)dq$ with $\alpha$ being the diagonal elements of $\Sigma$. Note this is the exact probability of a graph in $G(n, \bar{p}(t))$. If there are non-zero off-diagonal elements, then there are correlations between edges, and we do not have convergence to $G(n,p)$.

The RGG in $[0,1]^d$

Suppose now that $(\Omega, \rho) = ([0,1]^d, \|\cdot\|)$. We will need that our node distribution $\nu$ is of the form we described earlier, where the marginals $\pi$ have a kurtosis greater that 1. It can be shown that the only distributions with unit kurtosis are the Bernoulli distribution with parameter $1/2$, and constant distributions.

Theorem 2 [1]

Suppose $\mathcal{G}$ is an ensemble of RGGs in $[0,1]^d$ with nodes distributed according to $\nu$, then, provided the kurtosis of the marginals $\pi$ is greater than 1, $\mathcal{G}$ does not converge to the ER ensemble as $d\rightarrow\infty$.

Sketch Proof:

The proof is direct. We set $\beta = 0$, which for the Euclidean distance metric means (after some rearranging),
\begin{equation}
\mathbb{E}[(X_i-\mu)^4] – \mathbb{E}[(X_i-\mu)^2]^2 = 0
\end{equation}
or equivalently, the kurtosis of $X_i$ is 1.

This means, for any `sufficiently nice’ distribution of nodes, we do not converge to $G(n,p)$, and maintain spatial properties which can be exploited in data analysis.

The RGG in $\mathbb{T}^d$

In the torus, the following theorem shows that if the distribution of nodes is uniform, then we will in fact see convergence to $G(n,p)$. However, for any other distribution, we maintain the spatial correlation.

Theorem 3 [1]

Let $\mathcal{G}$ be an ensemble of RGGs on $\mathbb{T}^d$ with nodes distributed according to $\nu$. Then as $d\rightarrow\infty$, $\mathcal{G}$ converges in distribution to $G(n,p)$ if and only if $\nu$ is the uniform distribution.

Sketch Proof

The tactic for the proof is the same as for the cube. We will find a condition for which $\beta = 0$. In the torus, we have
\begin{equation}
\beta = 0 \iff \mathbb{E}_X[(\mathbb{E}_Y[\rho_t(X,Y)^2])^2] = \mathbb{E}_X[\mathbb{E}_Y[\rho_t(X,Y)^2]]^2
\end{equation}

From which we can deduce that $\mathbb{E}_Y[\rho_t(x,Y)]$ is constant $\pi$-almost-everywhere. This implies
\begin{equation}
\int_0^1 \rho_t(x,y)^2\pi(y)dy = \mu
\end{equation}
for $\pi$ almost every $x$. We can rewrite the left hand side above as the convolution of two periodic functions, and therefore taking a Fourier transform of both sides simplifies the problem. By equating Fourier modes, we find that the Fourier transform of $\hat{\pi}$ of $\pi$ must be zero everywhere except when evaluated at $0$. This means that the original function $\pi$ must be constant on $[0,1]$.

So, if we are using a toroidal distance metric, then if our data is uniformly distributed, we will lose the latent space embedding as $d\rightarrow\infty$.

Example

Here we plot the distribution of edge counts in the high-dimension limit for RGGs with uniformly (figure 2) and Gaussian distributed (figure 3) nodes to illustrate the difference that changing the node distribution can make.

Figure 2: Comparison of the theoretical distribution as $d\rightarrow\infty$ of edge counts for RGGs in $[0,1]$ and $\mathbb{T}^d$ with uniform nodes. Top: $\mathbb{P}(\text{# of edges } = k)/\binom{n}{k}$ for RGGs in the cube for $n=3$ and $n=7$. The torus would have a uniform distribution and is therefore omitted. Bottom: $\mathbb{P}(\text{# of edges }=k)$ for RGGs with $n=7$ in $[0,1]^d$ and $\mathbb{T}^d$
Figure 3: Comparison of the theoretical distribution as $d\rightarrow\infty$ of edge counts for RGGs in $[0,1]$ and $\mathbb{T}^d$ with Gaussian distributed nodes. Top: $\mathbb{P}(\text{# of edges } = k)/\binom{n}{k}$ for RGGs in $[0,1]^d$ and $\mathbb{T}^d$ for $n=3$ and $n=7$. Bottom: $\mathbb{P}(\text{# of edges }=k)$ for RGGs with $n=7$ in $[0,1]^d$ and $\mathbb{T}^d$

Conclusion

In this blog post, we have defined the random geometric graph (RGG) with general node distributions, and proved that most of the time, the spatial correlations between edges are preserved as the dimension of the underlying geometry tends to $\infty$. In the $d$-cube, geometry is preserved as long as the distribution of our nodes is neither constant, or Bernoulli with parameter $1/2$, and in the $d$-torus, geometry is preserved provided our distribution is not uniform. The result for the torus is especially interesting, since it challenges ideas in the literature about how we should model high dimensional RGGs. In real-world data, the coordinates are unlikely to be uniformly distributed, yet the majority of theoretical high-dimensional random geometric graph studies use uniform distributions on the torus. The issue is that this is the only case where the torus behaves like a $G(n,p)$ graph. For a more in-depth explanation of this work, and extensions to the concepts of graph entropy, see our recent preprint [1].

 

References

[1] O. Baker and C.P. Dettmann “Entropy of Random Geometric Graphs in High and Low Dimensions”. arXiv preprint arXiv:2503.11418 (2025)

[2] V. Erba et al. “Random geometric graphs in high dimension”. Phys. Rev. E 102.1 (2020), 012306.

[3] E. N. Gilbert. “Random Plane Networks”. Journal of the Society for Industrial and Applied Mathematics 9.4 (1961), 533–543.

Student perspectives: AI UK 2025 Conference

A post by Sam Bowyer, PhD student on the Compass programme.

Compass at AI UK

The Alan Turing Institute’s AI UK 2025 Conference was held last month in the QEII Centre, Westminster, and three Compass students – Emma Ceccherini, Sherman Khoo, and myself – were present for both days of the event. We attended a variety of sessions and spent time exploring the exhibition stalls, which showcased a wide range of AI projects from within academia, government and industry.

Compass students and staff pictured at the AI UK 2025 Conference
Compass CDT students and staff at the AI UK 2025 Conference. From left to right:
Dr Dan Lawson, Emma Ceccherini, Sam Bowyer, Sherman Khoo and Helen Mawdlsey

It was an eye-opening experience to learn about the work that The Alan Turing Institute does, and especially insightful to see the myriad downstream applications of the machine learning theory that we spend so much time thinking about.

Conference highlights

One particular favourite exhibition was that of the Ministry of Justice (MoJ). Emma and I talked to a data scientist at the MoJ who was working on a tool that uses LLMs to explain laws in plain English, in order to help regular people better understand their rights.

Another project involved aggregating various disconnected datasets from across government on the national and local level in order to research social factors that might lead to successful post-prison rehabilitation or equally to recidivism.

It was encouraging to see a variety of projects and organisations at the conference aiming to use AI for social and public good, with a significant amount focussed on the climate and green-tech.

Whilst Compass wasn’t presenting at AIUK, colleagues from the Informed AI Hub, the Interactive AI CDT, the AI For Collective Intelligence (AI4CI) Hub, and Jean Golding Institute were.

It was great to not only see the other projects that are going on in the University, but also to be able to network with colleagues who only work down the road from the Fry Building (e.g. sharing Bristol restaurant recommendations)!

On the first day of the conference, Professor Charlotte Deane, Executive Chair of the Engineering and Physical Sciences Research Council (EPSRC), gave an informative keynote talk on the state of scientific research in UK academia. It was surprising to learn about the overall size of EPSRC and the range of activities they engage in, particularly their keenness for investing in spin-outs. I found Professor Deane’s talk to be very encouraging and optimistic.

The second day of the conference focused on governmental uses of AI, particularly in medicine and in defence. Professor the Lord Darzi, who recently led the Independent Investigation of the NHS in England, gave an incredibly thoughtful talk on the opportunities for AI within the NHS.

He likened the current AI boom to the development of keyhole surgery in the second half of the 20th century, urging fast, nationwide deployment in order to improve health outcomes and equality throughout the UK.

Three talks on defence and national security similarly stressed the importance of fast uptake of AI tools and made clear the desire for public-private partnerships (including with academia) in order to make this happen. (The importance of cross-sector collaboration was consistently a strong theme at AIUK, although the absence of frontier AI labs did, in my opinion, betray a slight limit to this stated commitment).

Presentation karaoke

It wasn’t all so serious, however! The conference finished its first day with “Presentation Karaoke”, in which eight contestants competed to present unseen 5-minute long, 10-slide PowerPoints, each more bizarre than the last.

This fun, often slightly cringe-inducing, activity is now rumoured to be deployed at a future COMPASS student event. (Get practising your stand-up now…)

In summary, AIUK was a great opportunity to see how AI/ML research leads to real-world impact in the UK, and I would recommend attending to any CDT student in the future.

Guest Lecture: Professor Chris Breward

An Introduction to Knowledge Exchange

The Compass CDT was delighted to recently host a Guest Lecture by Professor Chris Breward, from the Mathematical Institute, University of Oxford.

Chris led an interactive session for our PhD students, which focused on getting started with knowledge exchange (KE), and explored the skills needed to engage with industrial and other external partners.

As Scientific Director of the Knowledge Exchange Hub for Mathematical Sciences and Co-Director of the EPSRC CDT in Industrially Focused Mathematical Modelling, Chris had a wide range of valuable advice to share.

Drawing on his experience building parternships with companies and setting-up projects with industry co-funding, he ran through the different ways researchers at all stages of their career can get involved in KE.

Attendees explored why companies might engage with mathematical scientists, discussed things to consider before meeting potential collaborators, and looked at what can sometimes go wrong with academic-business relationships.

Student reflections

“During his Guest Lecture, Chris chatted with all of us about ways to communicate with non-academics during shared projects and how to do positive work as mathematical consultants.

“The session covered the pragmatics and hard-skills of private sector contract work, as well as the soft skills of open body language, effective listening and people management.

“He described the barriers that can arise between researchers (mathematicians in particular) and industrial partners. We then chatted interactively through where these pitfalls come from and how best to avoid them.

“He also gave us an entry-level look into the broader differences between universities and industry.”

KE initiatives

Chris closed by encouraging attendees to get involved with some of the opportunities the KE Hub provides for PhD students and researchers, such as the online Triage Workshops. These events can provide a safe space for individuals to gain experience with knowledge exchange, by observing senior colleagues from across the country.

He expressed his hope that Compass students would benefit from the upcoming five-day European Study Group with Industry (ESGI), which will take place here at the University of Bristol from Saturday, 14 July to Wednesday, 18 July 2025.

The Compass CDT was grateful to Chris for giving up his time to visit us in the School of Mathematics’ Fry Building, and we look forward to seeing him in Bristol again in the future.

As well as being an applied mathematician, lecturer and researcher at University of Oxford, Chris is co-founding Chief Moderator of the Mathematics-In-Industry Reports online KE repository, and a member of the Newton Gateway’s Scientific Advisory Board.

Skip to toolbar