## Launch of industry-focused seminar series DataScience@Work

Compass is excited to announce the launch of the DataScience@work seminar series. This new seminar series invites speakers from external organisations to talk about their experiences as Data Scientists in industry, government and the third sector. The dual meaning of DataScience@work focuses talks on both the technical side of the speakers’ roles as well as working as part of a wider organisation, and building a career in data science.

Highlighting the importance of the new seminar series, Prof Nick Whiteley (Compass Director) says…

Compass aims to develop scientific and professionally agility in its students. Our goal is to connect technical expertise in data science with experience of thinking, communicating and collaborating across disciplines and across sectors. In our new DataScience@Work seminar series, Compass partners from industry will share insights into the key role of Data Science within their organisations, their objectives and future outlook. This is a great opportunity for our students to learn about career trajectories beyond academia, helping shape their aspirations and personal goals for life beyond the PhD. I’m especially grateful to Adarga, CheckRisk, IBM Research, Improbable, and Shell for leading this first season of DataScience@Work and for their ongoing support for Compass.

Invited speakers to the 2020/21 session include:

## Compass Special Lecture: Jonty Rougier

Compass is excited to announce that Jonty Rougier (2021 recipient of the Barnett Award) will be delivering a Compass Special Lecture.

Jonty’s experience lies in Computer Experiments, computational statistics and Machine Learning, uncertainty and risk assessment, and decision support. In 2021, he was awarded Barnett Award by the RSS, which is made to those internationally recognised for contributions in the field of environmental statistics, risk and uncertainty quantification. Rougier has also advised several UK Government departments and agencies, including a secondment to the Cabinet Office in 2016/17 to contribute to the UK National Risk Assessment.

13th April 2021
09:30 - 14:00

## Student perspectives: Wessex Water Industry Focus Lab

A post by Michael Whitehouse, PhD student on the Compass programme.

Introduction

September saw the first of an exciting new series of Compass industry focus labs; with this came the chance to make use of the extensive skill sets acquired throughout the course and an opportunity to provide solutions to pressing issues of modern industry. The partner for the first focus lab, Wessex Water, posed the following question: given time series data on water flow levels in pipes, can we detect if new leaks have occurred? Given the inherent value of clean water available at the point of use and the detriments of leaking this vital resource, the challenge of ensuring an efficient system of delivery is of great importance. Hence, finding an answer to this question has the potential to provide huge economic, political, and environmental benefits for a large base of service users.

Data and Modelling:

The dataset provided by Wessex Water consisted of water flow data spanning across around 760 pipes. After this data was cleaned and processed some useful series, such as minimum nightly and average daily flow (MNF and ADF resp.), were extracted. Preliminary analysis carried out by our collaborators at Wessex Water concluded that certain types of changes in the structure of water flow data provide good indications that a leak has occurred. From this one can postulate that detecting a leak amounts to detecting these structural changes in this data. Using this principle, we began to build a framework to build solutions: detect the change; detect a new leak. Change point detection is a well-researched discipline that provides us with efficient methods for detecting statistically significant changes in the distribution of a time series and hence a toolbox with which to tackle the problem. Indeed, we at Compass have our very own active member of the change point detection research community in the shape of Dom Owens. The preliminary analysis gave that there are three types of structural change in water flow series that indicate a leak: a change in the mean of the MNF, a change in trend of the MNF, and a change in the variance of the difference between the MNF and ADF. In order to detect these changes with an algorithm we would need to transform the given series so that the original change in distribution corresponded to a change in the mean of the transformed series. These transforms included calculating generalised additive model (GAM) residuals and analysing their distribution. An example of such a GAM is given by:

$\mathbb{E}[\text{flow}_t] = \beta_0 \sum_{i=1}^m f_i(x_i).$

Where the x i ’s are features we want to use to predict the flow, such as the time of day or current season. The principle behind this analysis is that any change in the residual distribution corresponds to a violation of the assumption that residuals are independently, identically distributed and hence, in turn, corresponds to a deviation from the original structure we fit our GAM to. (more…)

## Student perspectives: Three Days in the life of a Silicon Gorge Start-Up

A post by Mauro Camara Escudero, PhD student on the Compass programme.

Last December the first Compass cohort partook a 3-day entrepreneurship training with SpinUp Science. Keep reading and you might just find out if the Silicon Gorge life is for you!

### The Ambitious Project of SpinUp Science

SpinUp Science’s goal is to help PhD students like us develop an entrepreneurial skill-set that will come in handy if we decide to either commercialize a product, launch a start-up, or choose a consulting career.

I can already hear some of you murmur “Sure, this might be helpful for someone doing a much more applied PhD but my work is theoretical. How is that ever going to help me?”. I get that, I used to believe the same. However, partly thanks to this training, I changed my mind and realized just how valuable these skills are independently of whether you decide to stay in Academia or find a job at an established company.

Anyways, I am getting ahead of myself. Let me first guide you through what the training looked like and then we will come back to this!

### Day 1 – Meeting the Client

The day started with a presentation that, on paper, promised to be yet another one of those endless and boring talks that make you reach for the Stop Video button and take a nap. The vague title “Understanding the Opportunity” surely did not help either. Instead, we were thrown right into action! (more…)

## Student perspectives: The Elo Rating System – From Chess to Education

A post by Andrea Becsek, PhD student on the Compass programme.

If you have recently also binge-watched the Queen’s Gambit chances are you have heard of the Elo Rating System. There are actually many games out there that require some way to rank players or even teams. However, the applications of the Elo Rating System reach further than you think.

### History and Applications

The Elo Rating System 1 was first suggested as a way to rank chess players, however, it can be used in any competitive two-player game that requires a ranking of its players. The system was first adopted by the World Chess Federation in 1970, and there have been various adjustments to it since, resulting in different implementations by each organisation.

For any soccer-lovers out there, the FIFA world rankings are also based on the Elo System, but if you happen to be into a similar sport, worry not, Elo has you covered. And the list of applications goes on and on: Backgammon, Scrabble, Go, Pokemon, and apparently even Tinder used it at some point.

Fun fact: The formulas used by the Elo Rating make a famous appearance in the Social Network, a movie about the creation of Facebook. Whether this was the actual algorithm used for FaceMash, the first version of Facebook, is however unclear.

All this sounds pretty cool, but how does it actually work?

### How it works

We want a way to rank players and update their ranking after each game. Let’s start by assuming that we have the ranking for the two players about to play: $\theta_i$ for player $i$ and $\theta_j$ for player $j$. Then we can compute the probability of player $i$ winning against player $j$ using the logistic function:

$P(Y_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-\theta_j)\}}.$

Given what we know about the logistic function, it’s easy to notice that the smaller the difference between the players, the less certain the outcome as the probability of winning will be close to $0.5$.

Once the outcome of the game is known, we can update both players’ abilities

$\theta_{i}:=\theta_{i}+K(Y_{ij}-P(Y_{ij}=1))$

$\theta_{j}:=\theta_{j}+K(P(Y_{ij}=1)-Y_{ij}).$

The $K$ factor controls the influence of a player’s performance on their previous ranking. For players with high rankings, a smaller $K$ is used because we expect their abilities to be somewhat stable and hence their ranking shouldn’t be too heavily influenced by every game. On the other hand, players with low ability can learn and improve quite quickly, and therefore their rating should be able to fluctuate more so they have a larger $K$ number.

The term in the brackets represents how different the actual outcome is from the expected outcome of the game. If a player is expected to win but doesn’t, their ranking will decrease, and vice versa. The larger the difference, the more their rating will change. For example, if a weaker player is highly unlikely to win, but they do, their ranking will be boosted quite a bit because it was a hard battle for them. On the other hand, if a strong player is really likely to win because they are playing against a weak player, their increase in score will be small as it was an easy win for them.

### Elo Rating and Education

As previously mentioned, the Elo Rating System has been used in a wide range of fields and, as it turns out, that includes education, more specifically, adaptive educational systems 2. Adaptive educational systems are concerned with automatically selecting adequate material for a student depending on their previous performance.

Note that a system can be adaptive at different levels of granularity. Some systems might adapt the homework from week to week by generating it based on the student’s current ability and update their ability once the homework has been completed. Whereas other systems are able to update the student ability after every single question. As you can imagine, using the second system requires a fairly fast, online algorithm. And this is where the Elo Rating comes in.

For an adaptive system to work, we need two key components: student abilities and question difficulties. To apply the Elo Rating to this context, we treat a student’s interaction with a question as a game where the student’s ranking represents their ability and the question’s ranking represents its difficulty. We can then predict whether a student of ability $\theta_i$ will answer a question of difficulty $d_j$ correctly using

$P(\text{correct}_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-d_j)\}}.$

and the ability and difficulty can be updated using

$\theta_{i}:=\theta_{i}+K(\text{correct}_{ij}-P(\text{correct}_{ij}=1))$

$d_{j}:=d_{j}+K(P(\text{correct}_{ij}=1)-\text{correct}_{ij}).$

So even if you only have $10$ minutes to create an adaptive educational system you can easily implement this algorithm. Set all abilities and question difficulties to $0$, let students answer your questions, and wait for the magic to happen. If you do have some prior knowledge about the difficulty of the items you could of course incorporate that into the initial values.

One important thing to note is that one should be careful with ranking students based on their abilities as this could result in various ethical issues. The main purpose of obtaining their abilities is to track their progress and match them with questions that are at the right level for them, easy enough to stay motivated, but hard enough to feel challenged.

### Conclusion

So is Elo the best option for an adaptive system? It depends. It is fast, enables on the fly updates, it’s easy to implement, and in some contexts, it even has a similar performance to more complex models. However, there are usually many other factors that can be relevant to predicting student performance, such as the time spent on a question or the number of hints they use. This additional data can be incorporated into more complex models, probably resulting in better predictions and offering much more insight. At the end of the day, there is always a trade-off, so depending on the context it’s up to you to decide whether the Elo Rating System is the way to go.

Find out more about Andrea Becsek and her work on her profile page.

1. Elo, A.E., 1978. The rating of chessplayers, past and present. Arco Pub.
2. Pelánek, R., 2016. Applications of the Elo rating system in adaptive educational systems. Computers & Education, 98, pp.169-179.

The University of Bristol is excited to announce IBM Research Europe as a new partner of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. IBM scientists are collaborating with Prof. Robert Allison and Compass PhD student Anthony Stephenson, on a research project entitled Fast Bayesian Inference at Extreme Scale. The project’s aim is to extend Bayesian inference algorithms to the ‘extreme scales’ that many deep learning workloads occupy, by placing more focus on AI methodologies which furnish both an accurate prediction, and critically, a high-quality uncertainty representation for predictions.

For more than seven decades, IBM Research has defined the future of information technology with more than 3,000 researchers in 19 locations across six continents. Scientists from IBM Research have produced six Nobel Laureates, 10 U.S. National Medals of Technology, five U.S. National Medals of Science, six Turing Awards, 19 inductees in the National Academy of Sciences and 20 inductees into the U.S. National Inventors Hall of Fame

IBM has European research locations in Switzerland (Zurich), England (Hursley and Daresbury), and Ireland (Dublin), with a large development lab in Germany focused on AI, quantum computing, security and hybrid cloud.

IBM’s global labs are involved hundreds of joint projects with universities particularly throughout Europe, in research programs established by the European Union and the local governments, and in cooperation agreements with research institutes of industrial partners.

Compass is a 4-year PhD training programme focusing on Computational Statistics and Data Science. This new venture is part of the Compass mission to promote academic and professional agility in its students, equipping them with the skills and experience to work across disciplines in academia and beyond.

Anthony Stephenson is the PhD student recruited to this project says, “After several years working in industry, I am pleased to be starting the Compass programme and shifting my focus to research. Having the combined forces of the University of Bristol and IBM behind me inspires confidence and I look forward to working with members of each of them. My project, scalable inference in non-linear Bayesian models, is also a highly relevant and exciting area to work on, with many applications in modern machine learning.”

Dr Ed Pyzer-Knapp is World-Wide IBM Research Lead in AI Enriched Modelling and Simulation and says, “I am very excited to work with Anthony and Robert – scaling Bayesian inference is a really important area of machine learning research; bringing to bear our mantra of fusing of bits and neurons to further develop the future of computing. This project is a great opportunity to further strengthen our relationship with the University of Bristol.”

Prof Robert Allison is Anthony’s academic supervisor at the University of Bristol and says, “I’m really looking forward to working with Anthony and Ed on a highly important and widely applicable area of machine learning which encompasses mathematical research, data-analysis, algorithm development and efficient large-scale computation. In addition, I see this project as an ideal opportunity to seed wider ranging data-science and machine learning collaborations between IBM Research, their academic partners and the University of Bristol.”

As Director of Compass, Prof Nick Whiteley say “I’m absolutely delighted to welcome IBM Research to Compass. This project is a fantastic opportunity for Anthony to tackle a very challenging and increasingly important AI research problem under Prof. Allison and Dr. Pyzer-Knapp’s supervision. As this collaboration develops, I look forward to all Compass students learning about IBM’s vision for the future of AI and its connection to the expertise in statistical methodology and computing they will acquire through the Compass training programme.”

## Improbable sponsors Compass PhD student in new partnership

Improbable, a global technology company which provides innovative products and services to makers of virtual worlds and simulations, is sponsoring a PhD research project entitled Agent-based model calibration using likelihood-free inference.

The University of Bristol is announcing a new industrial sponsor of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. Improbable, a global technology company which provides innovative products and services to makers of virtual worlds and simulations, is sponsoring a PhD research project entitled Agent-based model calibration using likelihood-free inference. The project’s aim is to devise a general framework for calibrating agent-based models from training data by inferring the model parameters in a statistical framework.

## Sparx joins as Compass’ newest industrial partner

The University of Bristol is today announcing a new supporter of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. South West based learning technology company Sparx, has agreed to sponsor a PhD student’s research project which will investigate new approaches to longitudinal statistical modelling within school-based mathematics education.

Sparx, which is located in Exeter, develops maths learning tools to support teaching and learning in secondary education. As an evidence-led company, Sparx has invested heavily in researching how technology can support the teaching and learning of maths and worked closely with local schools. This new investment underlines their ongoing commitment to research.

## Compass welcomes first cohort

Twelve industry partners joined our new students at our Centre for Doctoral Training launch event.

The EPSRC CDT Compass welcomed its new students and external partners from industry and government agencies for the inaugural Partner Day on 1 October 2019.

The Engineering and Physical Sciences Research Council awarded the School of Mathematics over £6 million to launch the new Centre for Doctoral Training (CDT).   The first intake of nine students registered in September to embark on this innovative 4-year PhD programme which combines training and research to address challenges across science, industry, and society.

Representatives from the external partner organisations visited the newly remodelled home of the School of Mathematics in the Fry Building.   The Compass team learned about partner’s work in Data Science ranging from smart apps for mathematical education to forecasting tools for energy systems.  We welcomed representatives from:

Adarga | AWE| CheckRisk | EDF | GSK | LV | Office for National Statistics | Malvern Panalytical | UK Space Agency | SCIEX| SPARX | Trainline

Dr. Nick Whiteley, Compass Director commented: “I am thrilled to see our new students embark on the Compass programme and take their first steps in research. It was a pleasure to welcome our partners to the wonderful Fry Building, where they shared thought-provoking insights into the work they do and the statistical challenges they face. Our partners play a key role in enriching the training Compass students receive and the research they conduct.”

Jasmine Grimsley from Office for National Statistics said:

“The Data Science Campus at the Office for National Statistics plans for work with Compass Students on projects that use data science for public good.  Students have the opportunity to come and work with us on industry placement or be able to work on real world problems for their theses topic.”