Compass students attending the Workshop on Functional Inference and Machine Intelligence (FIMI) at ISM Tokyo

A post by Compass CDT students Edward Milsom, Jake Spiteri, Jack Simons, and Sam Stockman.

We (Edward Milsom, Jake Spiteri, Jack Simons, Sam Stockman) attended the 2023 Workshop on Functional Inference and Machine Intelligence (FIMI) taking place on the 14, 15 and 16th of March at the Institute of Statistical Mathematics in Tokyo, Japan. Our attendance to the workshop was to further collaborative ties between the two institutions. The in-person participants included many distinguished academics from around Japan as well as our very own Dr Song Liu. Due to the workshops modest size, there was an intimate atmosphere which nurtured many productive research discussions. Whilst staying in Tokyo, we inevitably sampled some Japanese culture, from Izakayas to cherry blossoms and sumo wrestling!

We thought we’d share some of our thoughts and experiences. We’ll first go through some of our most memorable talks, and then talk about some of our activities outside the workshop.

Talks

Sho Sonoda – Ridgelet Transforms for Neural Networks on Manifolds and Hilbert Spaces

We particularly enjoyed the talk given by Sho Sonoda, a Research Scientist from the Deep Learning Theory group at Riken AIP on “Ridgelet Transforms for Neural Networks on Manifolds and Hilbert Spaces.” Sonoda’s research aims to demystify the black box nature of neural networks, shedding light on how they work and their universal approximation capabilities. His talk provided valuable insights into the integral representations of neural networks, and how they can be represented using ridgelet transforms. Sonoda presented a reconstruction formula from which we see that if a neural network can be represented using ridgelet transforms, then it is a universal approximator. He went on to demonstrate that various types of networks, such as those on finite fields, group convolutional neural networks (GCNNs), and networks on manifolds and Hilbert spaces, can be represented in this manner and are thus universal approximators. Sonoda’s work improves upon existing universality theorems by providing a more unified and direct approach, as opposed to the previous case-by-case methods that relied on manual adjustments of network parameters or indirect conversions of (G)CNNs into other universal approximators, such as invariant polynomials and fully-connected networks. Sonoda’s work is an important step toward a more transparent and comprehensive understanding of neural networks.

Greg Yang – The unreasonable effectiveness of mathematics in large scale deep learning

Greg Yang is a researcher at Microsoft Research who is working on a framework for understanding neural networks called “tensor programs”. Similar to Neural Tangent Kernels and Neural Network Gaussian Processes, the tensor program framework allows us to consider neural networks in the infinite-width limit, where it becomes possible to make statements about the properties of very wide networks. However, tensor programs aim to unify existing work on infinite-width neural networks by allowing one to take the infinite limit of a much wider range of neural network architectures using one single framework.

In his talk, Yang discussed his most recent work in this area, concerning the “maximal update parametrisation”. In short, they show that in this parametrisation, the optimal hyperparameters of very wide neural networks are the same as those for much smaller neural networks. This means that hyperparameter search can be done using small, cheap models, and then applied to very large models like GPT-3, where hyperparameter search would be too expensive. The result is summarised in this figure from their paper “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”, which shows how this is not possible in the standard parametrisation. This work was only possible by building upon the tensor program framework, thereby demonstrating the value of having a solid theoretical understanding of neural networks.

Statistical Seismology Seminar Series

In addition to the workshop, Sam attended the 88th Statistical Seismology seminar in the Risk Analysis Research Centre at ISM https://www.ism.ac.jp/~ogata/Ssg/ssg_statsei_seminarsE.html. The Statistical Seismology Research Group at ISM was created by Emeritus Professor Yosihiko Ogata and is one of the leading global research institutes for statistical seismology. Its most significant output has been the Epidemic-Type Aftershock Sequence (ETAS) model, a point process based earthquake forecasting model that has been the most dominant model for forecasting since its creation by Ogata in 1988.

As part of the Seminar series, Sam gave a talk on his most recent work (Forecasting the 2016-2017 Central Apennines Earthquake Sequence with a Neural Point Process’, https://arxiv.org/abs/2301.09948) to the research group and other visiting academics.

Japan’s interest is earthquake science is due to the fact that they record the most earthquakes in the world. The whole country is in a very active seismic area, and they have the densest seismic network. So even though they might not actually have the most earthquakes in the world (which is most likely Indonesia) they certainly document the most. The evening before flying back to the UK, Sam and Jack felt a magnitude 5.2 earthquake 300km north of Tokyo in the Miyagi prefecture. At that distance all that was felt was a small shudder…

Japan

It’s safe to say that the abundance of delicious food was the most memorable aspect of our trip. In fact, we never had a bad meal! Our taste buds were taken on a culinary journey as we tried a variety of Japanese dishes. From hearty, broth-based bowls of ramen and tsukemen, to fun conveyor-belt sushi restaurants, and satisfying tonkatsu (breaded deep-fried pork cutlet) with sticky rice or spicy udon noodles, we were never at a loss for delicious options. We even had the opportunity to cook our own food at an indoor barbecue!

Aside from the food, we thoroughly enjoyed our time in Tokyo – exploring the array of second-hand clothes shops, relaxing in bath-houses, and trying random things from the abundance of vending machines.

 

Compass at NeurIPS 2022

A post by Anthony Stephenson, Jack Simons, and Dan Ward, PhD students on the Compass programme.

Introduction

Ant Stephenson, Jack Simons, and I (Dan Ward) had the pleasure of attending the 2022 Conference on Neural Information Processing Systems (NeurIPS), one of the largest machine learning conferences in the world. The conference was held in New Orleans, which gave us an opportunity to explore a lively city full of culture with delicious local cuisine. We thought we’d collaborate on a blog post together covering some of the highlights.

Memorable Talks

The conference had broad range of talks including technical presentations of research, applied projects, and discussions of the philosophical and ethical questions that arise in AI. To give a taste of some of the talks, we picked out some of our favourites below.

Noam Brown: Human Modelling and Strategic Reasoning in the Game of Diplomacy.

The Game of Diplomacy is a strategic board game invented in 1954. It’s unique feature, and of crucial importance of the game, is that players interact via natural language to form allegiances. Whilst AI has been successful in beating humans in many purely adversarial games (e.g. Chess, Go), this collaborative element poses unique challenges. Firstly, it isn’t obvious how to evaluate/devise strategies for collaboration/betrayal, especially in the self-play-based reinforcement learning paradigm. Secondly, as communication happens via natural language, the AI must be able to translate their strategic plan into text. This strange combination of problems lead to interesting and innovative solutions. Paper link here.

Geoffrey Hinton: The Forward-Forward Algorithm for Training Deep Neural Networks.

Among the great line-up of speakers was Professor Geoffrey Hinton, known for popularising backpropogation for deep neural networks. Inspired by producing a more biologically plausible algorithm for learning, he has proposed the ‘Forward-Forward’ algorithm which he claims can also explain the phenomena of sleep! Professor Geoffrey Hinton then went on to express his belief that using biologically-inspired hardware, so-called neuromorphic computing, may play a key role in advancing AI. The talk was certainly unconventional, but nevertheless entertaining. Paper link here.

David Chalmers: Could a Large Language Model be Conscious?

Amongst all the machine-learning experts was David Chalmers, a philosopher! There are important questions regarding the possibility that language models might be conscious. David Chalmers aimed to educate the machine-learning audience in attendance of how we can better think about these problems and re-phrase the questions that we’re asking. We concluded that these questions are, unsurprisingly, best left to philosophers!

Poster Sessions


Jack and Dan:

I (Dan), presented a poster of my work at the conference, on Robust Neural Posterior Estimation (paper link here). I was definitely surprised by the scale of the poster sessions, and the broad scope of all the work taking place. Below is some of the posters that me and Jack found interesting:

Contrastive Neural Ratio Estimation
Benjamin K. Miller · Christoph Weniger · Patrick Forré
Authors propose NRE-C which aims to generalise NRE-A (Hermans et al. (2019)) and NRE-B (Durkan et al. (2020)) into one method. NRE-C can recover both methods by taking their two introduced hyperparameters at certain limits. Paper link here.

Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation
Arnaud Delaunoy · Joeri Hermans · François Rozet · Antoine Wehenkel · Gilles Louppe
These authors also make a contribution to the field of neural ratio estimation in the simulation-based inference context. Authors propose the notion of a “balanced” classifier, which is a classifier in which the average output from the classifier over the positive data class plus the average output over the negative data class equals to 1. The authors argue that if one has a classifier is balanced then it will lead to more conservative posterior estimates, which is something which practitioners seek. To integrate this into an algorithm they suggest adding a penalisation term onto the standard logistic-loss which punishes classifiers as they become less balanced. Paper link here.

Training and Inference on Any-Order Autoregressive Models the Right Way
Andy Shih · Dorsa Sadigh · Stefano Ermon
A joint distribution can be decomposed into its univariate conditionals by the chain rule, although by doing so we implicitly choose an ordering in a model, which prevents arbitrary conditional inference. Any-order autoregressive models circumvent this generally by being trained such that all possible univariate conditionals are considered, but this leads to learning redundant information. The paper proposes a new method to train autoregressive models, using a subset of univariate conditionals that still supports arbitrary conditional inference. This research was also presented as a talk, but sadly we missed it!  Paper link here.

Anthony:

The poster sessions formed the bulk of the conference timetable, with 2 2-hour sessions per day, on Tuesday, Wednesday and Thursday. These were very busy, with many posters on a wide-range of topics and a large congregation of attendees. As a result, it was sometimes difficult to track down the subset of posters on material of particular interest and when this feat was achieved, on occasion it was still hard work to actually have a detailed conversation with the author(s). Nonetheless, it was interesting to see the how varied the subjects of the poster were and in addition get a feeling for “themes” of the conference: recurring, clearly in-vogue topics. Amongst the sea of posters, I did manage to find a number relating to GPs; of these, those I found most interesting were:

Posterior and Computational Uncertainty in Gaussian Processes:
Jonathan Wenger · Geoff Pleiss · Marvin Pförtner · Philipp Hennig · John Cunningham
Here the authors propose a way to naturally incorporate uncertainty introduced from the use of (iterative) GP approximation methods. Paper link here.

Sparse Gaussian Process Hyperparameters: Optimize or Integrate?
Vidhi Lalchand · Wessel Bruinsma · David Burt · Carl Edward Rasmussen
The authors attempt to integrate a fully-Bayesian inference procedure for sparse GPs, as an alternative to the commonly adopted approach of optimising the kernel hyperparameters by maximum likelihood estimation. Paper link here.

Log-Linear-Time Gaussian Processes Using Binary Tree Kernels:
Michael K. Cohen · Samuel Daulton · Michael A Osborne
The idea here feels a bit unorthodox; they use a “binary-tree” kernel which discretises the space, with quantization error determined by the number of leaves. This would seem to lose interpretability on the properties of the function prior (e.g. smoothness), but does appear to give empirical benefits in their experiments. Paper link here.

Workshops

In addition to the main conference, on the Friday and Saturday at the end of the week there were a selection of workshops on a variety of sub-fields within machine learning. If you are fortunate enough for there to be a workshop dedicated to your research area, then they provide a space to gather people with research directly relevant to your own and facilitate helpful discussions and networking opportunities.

Anthony:

For me, the “Gaussian processes, spatiotemporal modeling and decision-making systems” workshop was the most useful part of the conference. It gave me the chance to speak to people working on interesting problems related to my own; discover the kind of directions they are heading in and lines of work they are contemplating. Additionally, I presented a poster during this workshop which allowed me to discuss my work with an audience well-versed on the topic and its possible significance.

The Big Easy

In addition to the actual conference, attending NeurIPS also gave us the opportunity to explore the city of New Orleans; aka The Big Easy. Upon arrival, we were immediately greeted in the airport by the sound of Louis Armstrong, a strong theme in the city, which features a park named after him. New ‘Awlins’ is well known for its jazz, but awareness of this fact does not necessarily prepare you for the sheer quantity, especially in the streets of the French Quarter, that awaits you. The real epicentre of jazz in the city is situated on Frenchmen street, on which a swathe of bars hosting nearly-nightly live music reside. We spent several evenings there, including one of particular note, where French president Emmanuel Macron suddenly appeared, trailed by an extensive retinue of blue-suited aides and bodyguards. Another street in New Orleans infamous for its nightlife is Bourbon street. Where Frenchmen street is focused on jazz, Bourbon street contains all manner of rowdy madness, assaulting your senses with noise, smells and sights as soon as you arrive. Both are necessary experiences when visiting The Big Easy.

Conclusion

All in all, the conference was a great opportunity to get a taste of the massive array of research that occurs in machine learning. We were all surprised by the scope of the research topics and talks, and enjoyed the opportunity to explore a new culture and city.

Harvard Prof delivers guest lectures to Compass students

We’re are delighted to welcome Pierre Jacob, Associate Professor of Statistics at Harvard University, to the University of Bristol in March.

Pierre will be delivering lectures for the COMPASS CDT students but all staff in the School of Maths are welcome to attend.

Title: Couplings and Monte Carlo

In his lectures, Pierre will cover couplings, total variation and optimal transport. He will describe the use of couplings in Monte Carlo methods, such as coupling from the past, diagnostics of convergence, and bias removal.

This event is sponsored by COMPASS – EPSRC Centre for Doctoral Training in Computational Statistics and Data Science.

Skip to toolbar