Congratulations to Compass students Edward Milsom and Ben Anson who, along with their supervisor, had their paper accepted for a poster at ICLR 2024.
Ed and Ben: In this paper we explore the importance of representation learning in convolutional neural networks, specifically in the context of an infinite-width limit called the Neural Network Gaussian Process (NNGP) that is often used by theorists. Representation learning refers to the ability of models to learn a transformation of the data that is tailored to the task at hand. This is in contrast to algorithms that use a fixed transformation of the data, e.g. a support vector machine with a fixed kernel function like the RBF kernel. Representation learning is thought to be critical to the success of convolutional neural networks in vision tasks, but networks in the NNGP limit do not perform representation learning, instead transforming the data with a fixed kernel function. A recent modification to the NNGP limit, called the Deep Kernel Machine (DKM), allows one to gradually “add representation learning back in” to the NNGP, using a single hyperparameter that controls the amount of flexibility in the kernel. We extend this algorithm to convolutional architectures, which required us to develop a new sparse inducing point approximation scheme. This allowed us to test on the full CIFAR-10 image classification dataset, where we achieved state-of-the-art test accuracy for kernel methods, with 92.7%.
In the plot below, we see how changing the hyperparameter (x-axis) to reduce flexibility too much harms the performance on unseen data.
A post by Edward Milsom, PhD student on the Compass programme.
This blog post provides a simple introduction to Deep Kernel Machines (DKMs), a novel supervised learning method that combines the advantages of both deep learning and kernel methods. This work provides the foundation of my current research on convolutional DKMs, which is supervised by Dr Laurence Aitchison.
Why aren’t kernels cool anymore?
Kernel methods were once top-dog in machine learning due to their ability to implicitly map data to complicated feature spaces, where the problem usually becomes simpler, without ever explicitly computing the transformation. However, in the past decade deep learning has become the new king for complicated tasks like computer vision and natural language processing.
Neural networks are flexible when learning representations
The reason is twofold: First, neural networks have millions of tunable parameters that allow them to learn their feature mappings automatically from the data, which is crucial for domains like images which are too complex for us to specify good, useful features by hand. Second, their layer-wise structure means these mappings can be built up to increasingly more abstract representations, while each layer itself is relatively simple. For example, trying to learn a single function that takes in pixels from pictures of animals and outputs their species is difficult; it is easier to map pixels to corners and edges, then shapes, then body parts, and so on.
Kernel methods are rigid when learning representations
It is therefore notable that classical kernel methods lack these characteristics: most kernels have a very small number of tunable hyperparameters, meaning their mappings cannot flexibly adapt to the task at hand, leaving us stuck with a feature space that, while complex, might be ill-suited to our problem. (more…)