## Student Perspectives: The Importance of Stability in Dynamic Network Analysis

A post by Ed Davis, PhD student on the Compass programme.

## Introduction

Today is a great day to be a data scientist. More than ever, our ability to both collect and analyse data allow us to solve larger, more interesting, and more diverse problems. My research focuses on analysing networks, which cover a mind-boggling range of applications from modelling vast computer networks [1], to detecting schizophrenia in brain networks [2]. In this blog, I want to share some of the cool research I have been a part of since joining the COMPASS CDT, which has to do with the analysis of dynamic networks.

## Network Basics

A network can be defined as an ordered pair, , where is a node (or vertex) set and is an edge set. From this definition, we can represent any node network in terms of an adjacency matrix, , where for nodes ,

.

When we model networks, we can assume that there are some unobservable weightings which mean that certain nodes have a higher connection probability than others. We then observe these in the adjacency matrix with some added noise (like an image that has been blurred). Under this assumption, there must exist some unobservable noise-free version of the adjacency matrix (i.e. the image) that we call the probability matrix, . Mathematically, we represent this by saying

,

where we have chosen a Bernoulli distribution as it will return either a 1 or a 0. As the connection probabilities are not uniform across the network (inhomogeneous) and the adjacency is sampled from some probability matrix (random), we say that is an inhomogeneous random graph.

Going a step further, we can model each node as having a *latent position*, which can be used to generate its connection probabilities and, hence define its behaviour. Using this, we can define node *communities*; a group of nodes that have the same underlying connection probabilities, meaning they have the same latent positions. We call this kind of model a *latent position model*. For example, in a network of social interactions at a school, we expect that pupils are more likely to interact with other pupils in their class. In this case, pupils in the same class are said to have similar latent positions and are part of a node community. Mathematically, we say there is a latent position assigned to each node, and then our probability matrix will be the gram matrix of some kernel, . From this, we generate our adjacency matrix as

.

Under this model, our goal is then to estimate the latent positions by analysing .

## Network Embedding

## New Opportunity: Funded PhD Project on Developing methods for model selection in causal health analyses

## How can statistical modelling tell us what causes disease? Electronic Health Records (EHR) have transformed medical research, with diverse examples including examining risks of emergency admissions on weekends vs weekdays, and health and psychological outcomes after COVID-19. The strengths of analyses based on EHR data include the very large number of individuals available for analysis and the extremely detailed data available for each individual.

## Virtual Applicant Open Day: 13 December 2021

## ILRI sponsors Compass PhD project

We are excited to announce a new partnership between Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science – and the International Livestock Research Institute (ILRI).

The first step in this new partnership is a co-funded and co-created PhD research project entitled *A spatially explicit assessment of agro-pastoral sustainability in Kenya and Ethiopia*. The aim of the PhD project is to develop a framework for the assessment of sustainability dynamics in ecologically important areas used by agro-pastoral and pastoral households. Mountainous areas are important water towers and reserves of biodiversity in East Africa, and conservation of such areas is important to stop degradation of the surrounding arid lowlands. However, population pressure and food demands continue to rise, so a sustainable balance between land use and land stewardship must be struck. The PhD project will build upon methods of agricultural sustainability assessment, and make use of spatial statistics to bring together data from household surveys, soil and water measurements, and remote sensing. The resulting analysis will contribute to the understanding of current human-environment interactions in the two study locations, and form the basis for developing scenarios considering the pros and cons of potential future changes. The PhD contributes to the ESSA project, and will operate in Yabelo, South-East Ethiopia, and the Taita Hills, South East Kenya.

“Coming from a geography background, the Compass-ILRI partnership is a fantastic opportunity for me to elevate my skill-set and apply cutting edge statistical techniques to the challenge of sustainable food security. ILRI are a world leader in agricultural research and I am really looking forward to learning from them and contributing to their important goal.” Dan Milner, Compass PhD student.

The International Livestock Research Institute (ILRI) works for better lives through livestock in developing countries. ILRI is co-hosted by Kenya and Ethiopia, has 14 offices across Asia and Africa, employs some 700 staff.

## DataScience@work: British Geological Survey

We’re excited to welcome Dr Kathryn Leeming for our first DataScience@work seminar of the academic year and our first in-person talk for this seminar!

## Student perspectives: Discovering the true positions of objects from pairwise similarities

A post by Annie Gray, PhD student on the Compass programme.

# Introduction

Initially, my Compass mini-project aimed to explore what we can discover about objects given a matrix of similarities between them. More specifically, how to appropriately measure the distance between objects if we represent each as a point in (the embedding), and what this can tell us about the objects themselves. This led to discovering that the geodesic distances in the embedding relate to the Euclidean distance between the original positions of the objects, meaning we can recover the original positions of the objects. This work has applications in fields that work with relational data for example: genetics, Natural Language Processing and cyber-security.

This work resulted in a paper [3] written with my supervisors (Nick Whiteley and Patrick Rubin-Delanchy), which has been accepted at NeurIPS this year. The following gives an overview of the paper and how the ideas can be used in practice.

## Welcome Cohort 3

A huge welcome to Compass to our 3rd Cohort of CDT students. Look out for their updates during this year about their research and experiences.

Edward Milsom, Ben Griffiths, Emerald Dilworth, Hannah Sansford, Daniel Milner, Harry Tata, Dominic Broadbent, Tennessee Hickling, Josh Givens, Ettore Fincato

## Access to Data Science: start your PhD journey with Compass

# Want to find out what a modern PhD in Statistics and Data Science is like?

**Access to Data Science** provides an immersive experience for prospective PhD students. This fully-funded, two-day event will be hosted by Compass academics and PhD students in the Fry Building, home to the School of Mathematics at the University of Bristol.

**Application deadline:** Monday 18 October 2021

**Event dates:** Monday 8 November – Tuesday 9 November

**Find out more about the event here**

The purpose of this event is to increase all aspects of diversity amongst data science researchers. **We particularly encourage applications from women and members of the LGBTQ+ and BAME communities to join us. **

## What to expect from the Access to Data Science event:

- attend seminars and guest lectures
- take part in a hands-on workshop
- have exclusive access to an application writing workshop
- work with the current Compass PhD students
- option to attend the Women and non-binary people in mathematics event.

## Who can attend

We welcome participants from a range of numerate academic backgrounds, with undergraduate degrees in subjects such as computer science, economics, epidemiology, mathematics, statistics and physics.

**We welcome applications from across the UK.** Access to Data Science participants will be offered hotel accommodation, reimbursement of travel costs and meals for each day of the event.

**Apply to Access to Data Science here **

## Student Perspectives: Contemporary Ideas in Statistical Philosophy

A post by Alessio Zakaria, PhD student on the Compass programme.

## Introduction

Probability theory is a branch of mathematics centred around the abstract manipulation and quantification of uncertainty and variability. It forms a basic unit of the theory and practice of statistics, enabling us to tame the complex nature of observable phenomena into meaningful information. It is through this reliance that the debate over the *true* (or more correct) underlying nature of probability theory has profound effects on how statisticians do their work. The current opposing sides of the debate in question are the *Frequentists* and the *Bayesians*. Frequentists believe that probability is intrinsically linked to the numeric regularity with which events occur, i.e. their frequency. Bayesians, however, believe that probability is an expression of someones degree of belief or confidence in a certain claim. In everyday parlance we use both of these concepts interchangeably: I estimate one in five of people have Covid; I was 50% confident that the football was coming home. It should be noted that the latter of the two is not a repeatable event per se. We cannot roll back time to check what the repeatable sequence would result in.