We (Edward Milsom, Jake Spiteri, Jack Simons, Sam Stockman) attended the 2023 Workshop on Functional Inference and Machine Intelligence (FIMI) taking place on the 14, 15 and 16th of March at the Institute of Statistical Mathematics in Tokyo, Japan. Our attendance to the workshop was to further collaborative ties between the two institutions. The in-person participants included many distinguished academics from around Japan as well as our very own Dr Song Liu. Due to the workshops modest size, there was an intimate atmosphere which nurtured many productive research discussions. Whilst staying in Tokyo, we inevitably sampled some Japanese culture, from Izakayas to cherry blossoms and sumo wrestling!
We thought we’d share some of our thoughts and experiences. We’ll first go through some of our most memorable talks, and then talk about some of our activities outside the workshop.
Sho Sonoda – Ridgelet Transforms for Neural Networks on Manifolds and Hilbert Spaces
We particularly enjoyed the talk given by Sho Sonoda, a Research Scientist from the Deep Learning Theory group at Riken AIP on “Ridgelet Transforms for Neural Networks on Manifolds and Hilbert Spaces.” Sonoda’s research aims to demystify the black box nature of neural networks, shedding light on how they work and their universal approximation capabilities. His talk provided valuable insights into the integral representations of neural networks, and how they can be represented using ridgelet transforms. Sonoda presented a reconstruction formula from which we see that if a neural network can be represented using ridgelet transforms, then it is a universal approximator. He went on to demonstrate that various types of networks, such as those on finite fields, group convolutional neural networks (GCNNs), and networks on manifolds and Hilbert spaces, can be represented in this manner and are thus universal approximators. Sonoda’s work improves upon existing universality theorems by providing a more unified and direct approach, as opposed to the previous case-by-case methods that relied on manual adjustments of network parameters or indirect conversions of (G)CNNs into other universal approximators, such as invariant polynomials and fully-connected networks. Sonoda’s work is an important step toward a more transparent and comprehensive understanding of neural networks.
Greg Yang – The unreasonable effectiveness of mathematics in large scale deep learning
Greg Yang is a researcher at Microsoft Research who is working on a framework for understanding neural networks called “tensor programs”. Similar to Neural Tangent Kernels and Neural Network Gaussian Processes, the tensor program framework allows us to consider neural networks in the infinite-width limit, where it becomes possible to make statements about the properties of very wide networks. However, tensor programs aim to unify existing work on infinite-width neural networks by allowing one to take the infinite limit of a much wider range of neural network architectures using one single framework.
In his talk, Yang discussed his most recent work in this area, concerning the “maximal update parametrisation”. In short, they show that in this parametrisation, the optimal hyperparameters of very wide neural networks are the same as those for much smaller neural networks. This means that hyperparameter search can be done using small, cheap models, and then applied to very large models like GPT-3, where hyperparameter search would be too expensive. The result is summarised in this figure from their paper “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”, which shows how this is not possible in the standard parametrisation. This work was only possible by building upon the tensor program framework, thereby demonstrating the value of having a solid theoretical understanding of neural networks.
Statistical Seismology Seminar Series
In addition to the workshop, Sam attended the 88th Statistical Seismology seminar in the Risk Analysis Research Centre at ISM https://www.ism.ac.jp/~ogata/Ssg/ssg_statsei_seminarsE.html. The Statistical Seismology Research Group at ISM was created by Emeritus Professor Yosihiko Ogata and is one of the leading global research institutes for statistical seismology. Its most significant output has been the Epidemic-Type Aftershock Sequence (ETAS) model, a point process based earthquake forecasting model that has been the most dominant model for forecasting since its creation by Ogata in 1988.
As part of the Seminar series, Sam gave a talk on his most recent work (Forecasting the 2016-2017 Central Apennines Earthquake Sequence with a Neural Point Process’, https://arxiv.org/abs/2301.09948) to the research group and other visiting academics.
Japan’s interest is earthquake science is due to the fact that they record the most earthquakes in the world. The whole country is in a very active seismic area, and they have the densest seismic network. So even though they might not actually have the most earthquakes in the world (which is most likely Indonesia) they certainly document the most. The evening before flying back to the UK, Sam and Jack felt a magnitude 5.2 earthquake 300km north of Tokyo in the Miyagi prefecture. At that distance all that was felt was a small shudder…
It’s safe to say that the abundance of delicious food was the most memorable aspect of our trip. In fact, we never had a bad meal! Our taste buds were taken on a culinary journey as we tried a variety of Japanese dishes. From hearty, broth-based bowls of ramen and tsukemen, to fun conveyor-belt sushi restaurants, and satisfying tonkatsu (breaded deep-fried pork cutlet) with sticky rice or spicy udon noodles, we were never at a loss for delicious options. We even had the opportunity to cook our own food at an indoor barbecue!
Aside from the food, we thoroughly enjoyed our time in Tokyo – exploring the array of second-hand clothes shops, relaxing in bath-houses, and trying random things from the abundance of vending machines.