## Student Perspectives: The Importance of Stability in Dynamic Network Analysis

A post by Ed Davis, PhD student on the Compass programme.

## Introduction

Today is a great day to be a data scientist. More than ever, our ability to both collect and analyse data allow us to solve larger, more interesting, and more diverse problems. My research focuses on analysing networks, which cover a mind-boggling range of applications from modelling vast computer networks [1], to detecting schizophrenia in brain networks [2]. In this blog, I want to share some of the cool research I have been a part of since joining the COMPASS CDT, which has to do with the analysis of dynamic networks.

## Network Basics

A network can be defined as an ordered pair, $(V, E)$, where $V$ is a node (or vertex) set and $E$ is an edge set. From this definition, we can represent any $n$ node network in terms of an adjacency matrix, $A \in \mathbb{R}^{n \times n}$, where for nodes $i, j \in V$,

$A_{ij} = \Bigg\{ \begin{array}{ll} 1 & (i,j) \in E \\ 0, & (i,j) \not\in E \end{array}$.

When we model networks, we can assume that there are some unobservable weightings which mean that certain nodes have a higher connection probability than others. We then observe these in the adjacency matrix with some added noise (like an image that has been blurred). Under this assumption, there must exist some unobservable noise-free version of the adjacency matrix (i.e. the image) that we call the probability matrix, $\mathbf{P} \in \mathbb{R}^{n \times n}$. Mathematically, we represent this by saying

$A_{ij} \overset{\text{ind}}{\sim} \text{Bernoulli} \left(P_{ij} \right)$,

where we have chosen a Bernoulli distribution as it will return either a 1 or a 0. As the connection probabilities are not uniform across the network (inhomogeneous) and the adjacency is sampled from some probability matrix (random), we say that $\mathbf{A}$ is an inhomogeneous random graph.

Going a step further, we can model each node as having a latent position, which can be used to generate its connection probabilities and, hence define its behaviour. Using this, we can define node communities; a group of nodes that have the same underlying connection probabilities, meaning they have the same latent positions. We call this kind of model a latent position model. For example, in a network of social interactions at a school, we expect that pupils are more likely to interact with other pupils in their class. In this case, pupils in the same class are said to have similar latent positions and are part of a node community. Mathematically, we say there is a latent position $\mathbf{Z}_i \in \mathbb{R}^{k}$ assigned to each node, and then our probability matrix will be the gram matrix of some kernel, $f: \mathbb{R}^k \times \mathbb{R}^k \rightarrow [0,1]$. From this, we generate our adjacency matrix as

$A_{ij} \overset{\text{ind}}{\sim} \text{Bernoulli}\left( f \left\{ \mathbf{Z}_i, \mathbf{Z}_j \right\} \right)$.

Under this model, our goal is then to estimate the latent positions by analysing $\mathbf{A}$.

## Student perspectives: Discovering the true positions of objects from pairwise similarities

A post by Annie Gray, PhD student on the Compass programme.

# Introduction

Initially, my Compass mini-project aimed to explore what we can discover about objects given a matrix of similarities between them. More specifically, how to appropriately measure the distance between objects if we represent each as a point in $\mathbb{R}^p$ (the embedding), and what this can tell us about the objects themselves. This led to discovering that the geodesic distances in the embedding relate to the Euclidean distance between the original positions of the objects, meaning we can recover the original positions of the objects. This work has applications in fields that work with relational data for example: genetics, Natural Language Processing and cyber-security.

This work resulted in a paper [3] written with my supervisors (Nick Whiteley and Patrick Rubin-Delanchy), which has been accepted at NeurIPS this year. The following gives an overview of the paper and how the ideas can be used in practice.

## Student Perspectives: Contemporary Ideas in Statistical Philosophy

A post by Alessio Zakaria, PhD student on the Compass programme.

## Introduction

Probability theory is a branch of mathematics centred around the abstract manipulation and quantification of uncertainty and variability. It forms a basic unit of the theory and practice of statistics, enabling us to tame the complex nature of observable phenomena into meaningful information. It is through this reliance that the debate over the true (or more correct) underlying nature of probability theory has profound effects on how statisticians do their work. The current opposing sides of the debate in question are the Frequentists and the Bayesians. Frequentists believe that probability is intrinsically linked to the numeric regularity with which events occur, i.e. their frequency. Bayesians, however, believe that probability is an expression of someones degree of belief or confidence in a certain claim. In everyday parlance we use both of these concepts interchangeably: I estimate one in five of people have Covid; I was 50% confident that the football was coming home. It should be noted that the latter of the two is not a repeatable event per se. We cannot roll back time to check what the repeatable sequence would result in.

## Student perspectives: How can we do data science without all of our data?

A post by Daniel Williams, Compass PhD student.

Imagine that you are employed by Chicago’s city council, and are tasked with estimating where the mean locations of reported crimes are in the city. The data that you are given only goes up to the city’s borders, even though crime does not suddenly stop beyond this artificial boundary. As a data scientist, how would you estimate these centres within the city? Your measurements are obscured past a very complex border, so regular methods such as maximum likelihood would not be appropriate.

This is an example of a more general problem in statistics named truncated probability density estimation. How do we estimate the parameters of a statistical model when data are not fully observed, and are cut off by some artificial boundary? (more…)

## Student Perspectives: Embedding probability distributions in RKHSs

A post by Jake Spiteri, Compass PhD student.

Recent advancements in kernel methods have introduced a framework for nonparametric statistics by embedding and manipulating probability distributions in Hilbert spaces. In this blog post we will look at how to embed marginal and conditional distributions, and how to perform probability operations such as the sum, product, and Bayes’ rule with embeddings.

## Embedding marginal distributions

Throughout this blog post we will make use of reproducing kernel Hilbert spaces (RKHS). A reproducing kernel Hilbert space is simply a Hilbert space with some additional structure, and a Hilbert space is just a topological vector space equipped with an inner product, which is also complete.

We will frequently refer to a random variable $X$ which has domain $\mathcal{X}$ endowed with the $\sigma$-algebra $\mathcal{B}_\mathcal{X}$.

Definition. A reproducing kernel Hilbert space $\mathcal{H}$ on $\mathcal{X}$ with associated positive semi-definite kernel function $k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$ is a Hilbert space of functions $f: \mathcal{X} \rightarrow \mathbb{R}$. The inner product $\langle\cdot,\cdot\rangle_\mathcal{H}$ satisfies the reproducing property: $\langle f, k(x, \cdot) \rangle_\mathcal{H}, \forall f \in \mathcal{H}, x \in \mathcal{X}$. We also have $k(x, \cdot) \in \mathcal{H}, \forall x \in \mathcal{X}$. (more…)