## Skills for Interdisciplinary Research

To acknowledge the variety of sectors where data science research is relevant, in March 2021, the Compass students are undertaking a series of workshops led by the Bristol Doctoral College to explore Skills for Interdisciplinary Research.  Using the Vitae framework for researcher development, our colleague at BDC will introduce Compass students to the following topics:

Workshop 1: What is a doctorate? A brief history of doctorates in the UK, how they have changed in the past two decades, why CDTs?, what skills are needed now for a doctorate?

Workshop 2: Interdisciplinarity – the foundations. A practical case study on interdisciplinary postgraduate research at Bristol.

Workshop 3: Ways of knowing, part 1 – Positivism and ‘ologies! Deconstructing some of the terminology around knowledge and how we know what we know. Underpinning assumption – to know your own discipline, you need to step outside of it and see it as others do.

Workshop 4: Ways of knowing, part 2 – Social constructionism and qualitative approaches to research. In part 1 of ways of knowing, the ideal ‘science’ approach is objective and the researcher is detached from the subject of study; looking at other approaches where the role of research is integral to the research.

Workshop 5: Becoming a good researcher – research integrity and doctoral students. A look at how dilemmas in research can show us how research integrity is not just a case of right or wrong.

Workshop 6: Getting started with academic publishing. An introduction on the scholarly publishing pressure in contemporary research and it explores what that means in an interdisciplinary context.

## Student Research Topics for 2020/21

This month, the Cohort 2 Compass students have started work on their mini projects and are establishing the direction of their own research within the CDT.

## Supervised by the Institute for Statistical Science:

Anthony Stevenson will be working with Robert Allison on a project entitled Fast Bayesian Inference at Extreme Scale.  This project is in partnership with IBM Research.

Conor Crilly will be working with Oliver Johnson on a project entitled Statistical models for forecasting reliability. This project is in partnership with AWE.

Euan Enticott will be working with Matteo Fasiolo and Nick Whiteley on a project entitled Scalable Additive Models for Forecasting Electricity Demand and Renewable Production.  This project is in partnership with EDF.

Annie Gray will be working with Patrick Rubin-Delanchy and Nick Whiteley on a project entitled Exploratory data analysis of graph embeddings: exploiting manifold structure.

Ed Davis will be working with Dan Lawson and Patrick Rubin-Delanchy on a project entitled Graph embedding: time and space.  This project is in partnership with LV Insurance.

Conor Newton will be working with Henry Reeve and Ayalvadi Ganesh on a project entitled  Decentralised sequential decision making and learning.

## The following projects are supervised in collaboration with the Institute for Statistical Science (IfSS) and our other internal partners at the University of Bristol:

Dan Ward will be working with Matteo Fasiolo (IfSS) and Mark Beaumont from the School of Biological Sciences on a project entitled Agent-based model calibration via regression-based synthetic likelihood. This project is in partnership with Improbable

Jack Simons will be working with Song Liu (IfSS) and Mark Beaumont (Biological Sciences) on a project entitled Approximate Bayesian Inference by Density Ratio Estimation

Georgie Mansell will be working with Haeran Cho (IfSS) and Andrew Dowsey from the School of Population Health Sciences and Bristol Veterinary School on a project entitled Statistical learning of quantitative data at scale to redefine biomarker discovery.  This project is in partnership with Sciex.

Shannon Williams will be working with Anthony Lee (IfSS) and Jeremy Phillips from the School of Earth Sciences on a project entitled Use and Comparison of Stochastic Simulations and Weather Patterns in probabilistic volcanic ash hazard assessments.

Sam Stockman  will be working with Dan Lawson (IfSS) and Maximillian Werner from the School of Geographical Sciences on a project entitled Machine Learning and Point Processes for Insights into Earthquakes and Volcanoes

## Responsible Innovation in Data Science Research

This February our 2nd year Compass students will attend workshops in responsible innovation.

Run in partnership with the School of Management, the structured module constitutes Responsible Innovation training specifically for research in Data Science.

Taking the EPSRC AREA (Anticipate, Reflect, Engage, Act) framework for Responsible Innovation as it’s starting point, the module will take students through a guided process to develop the skills, knowledge and facilitated experience to incorporate the tenets of the AREA framework in to their PhD practice. Topics covered will include:
· Ethical and societal implications of data science and computational statistics
· Skills for anticipation
· Reflexivity for researchers
· Public perception of data science and engagement of publics
· Regulatory frameworks affecting data science

## New opportunity: a jointly funded studentship with FAI Farms

Compass is very excited to advertise this PhD studentship in collaboration with FAI Farms on a vision-based system for automated poultry welfare assessment through deep learning and Bayesian modelling.

## About the Project

This is an exciting opportunity to join Compass’ 4-year programme with integrated training in the statistical and computational techniques of Data Science. You will be part of a dynamic cohort of PhD researchers hosted in the historic Fry Building, which has recently undergone a £35 million refurbishment as the new home for Bristol’s School of Mathematics.

FAI Farms is a multi-disciplinary team working in partnership with farmers and food companies to provide practical solutions for climate and food security. FAI’s state-of-the-art strategic advice, data insight, and education services, are powered by science, technology and best practice. Our strategic and evidence-based approach is focused on driving meaningful improvements across supply chains, mitigating risks and realising long term business benefits for our partners.

The aim of this PhD project is to create a vision-based system for the automated assessment of chicken welfare for use in poultry farms. The welfare of broiler chickens is a key ethical and economic challenge for the sustainability of chicken meat production. The presentation of natural, positive behaviour is important to ensure a “good life” for livestock species as well as being an expectation for many consumers. At present there are no ways to measure this, with good welfare habitually defined as the absence of negative experience. In addition, automated tracking of individual birds is very challenging due to occlusion and complexity. In this project the student will instead harness and develop novel deep learning approaches that consider individual animals and their behaviours probabilistically within the context of local and general activity within the barn and wider flock. The inferred behaviour rates amongst the flock will then be integrated with on-farm production, health and environmental data through Bayesian time series modelling to identify risk factors for positive welfare, predict farms at risk of poor welfare, and suggest interventions that avoid this scenario.

## Student perspectives: Wessex Water Industry Focus Lab

A post by Michael Whitehouse, PhD student on the Compass programme.

Introduction

September saw the first of an exciting new series of Compass industry focus labs; with this came the chance to make use of the extensive skill sets acquired throughout the course and an opportunity to provide solutions to pressing issues of modern industry. The partner for the first focus lab, Wessex Water, posed the following question: given time series data on water flow levels in pipes, can we detect if new leaks have occurred? Given the inherent value of clean water available at the point of use and the detriments of leaking this vital resource, the challenge of ensuring an efficient system of delivery is of great importance. Hence, finding an answer to this question has the potential to provide huge economic, political, and environmental benefits for a large base of service users.

Data and Modelling:

The dataset provided by Wessex Water consisted of water flow data spanning across around 760 pipes. After this data was cleaned and processed some useful series, such as minimum nightly and average daily flow (MNF and ADF resp.), were extracted. Preliminary analysis carried out by our collaborators at Wessex Water concluded that certain types of changes in the structure of water flow data provide good indications that a leak has occurred. From this one can postulate that detecting a leak amounts to detecting these structural changes in this data. Using this principle, we began to build a framework to build solutions: detect the change; detect a new leak. Change point detection is a well-researched discipline that provides us with efficient methods for detecting statistically significant changes in the distribution of a time series and hence a toolbox with which to tackle the problem. Indeed, we at Compass have our very own active member of the change point detection research community in the shape of Dom Owens. The preliminary analysis gave that there are three types of structural change in water flow series that indicate a leak: a change in the mean of the MNF, a change in trend of the MNF, and a change in the variance of the difference between the MNF and ADF. In order to detect these changes with an algorithm we would need to transform the given series so that the original change in distribution corresponded to a change in the mean of the transformed series. These transforms included calculating generalised additive model (GAM) residuals and analysing their distribution. An example of such a GAM is given by:

$\mathbb{E}[\text{flow}_t] = \beta_0 \sum_{i=1}^m f_i(x_i).$

Where the x i ’s are features we want to use to predict the flow, such as the time of day or current season. The principle behind this analysis is that any change in the residual distribution corresponds to a violation of the assumption that residuals are independently, identically distributed and hence, in turn, corresponds to a deviation from the original structure we fit our GAM to.

Figure 1: GAM residual plot. Red lines correspond to detected changes in distribution, green lines indicate a repair took place.

A Change Point Detection Algorithm:

In order to detect changes in real time we would need an online change point detection algorithm, after evaluating the existing literature we elected to follow the mean change detection procedure described in [Wang and Samworth, 2016]. The user-end procedure is as follows:

1. Calculate mean estimate $\hat{\mu}$ on some data we assume is stationary.
2. Feed a new observation into the algorithm. Calculate test statistics based on new data.
3. Repeat (2) until any test statistics exceed a threshold at which point we conclude a mean change has been detected. Return to (1).

Due to our 2 week time restraint we chose to restrict ourselves to finding change points corresponding to a mean change, just one of the 3 changes we know are indicative of a leak. As per the fundamental principles of decision theory, we would like to tune and evaluate our algorithm by minimising some loss function which depends on some ‘training’ data. That is, we would like to look at some past period of time and make predictions of when leaks happened given the flow data across the same period, then we evaluate how accurate these predictions were and adjust or asses the model accordingly. However, to do this we would need to know when and where leaks actually occurred across the time period of the data, something we did not have access to. Without ‘labels’ indicating that a leak has occurred, any predictions from the model were essentially meaningless, so we sought to find a proxy. The one potentially useful dataset we did have access to was that of leak repairs. It is clear that a leak must have occurred if a repair has occurred, but for various reasons this proxy does not provide an exhaustive account of all leaks. Furthermore, we do not know which repairs correspond to leaks identified by the particular distributional change in flow data we considered. This, in turn, means that all measures of model performance must come with the caveat that they are contingent on incomplete data. If when conducting research we find out results are limited it is our duty as statisticians to report when this is the case – it is not our job to sugar coat or manipulate our findings, but to report them with the limitations and uncertainty that inextricably come alongside. Results without uncertainty are as meaningless as no results at all. This being said, all indications pointed towards the method being effective in detecting water flow data mean change points which correspond to leak repairs, giving a positive result to feedback to our friends at Wessex Water.

Final Conclusions:

Communicating statistical concepts and results to an audience of varied areas and levels of expertise is important now more than ever. The continually strengthening relationships between Compass and its industrial partners are providing students with the opportunity to gain experience in doing exactly this. The focus lab concluded with a presentation on our findings to the Wessex Water operations team,  during which we reported the procedures and results. The technical results were well supported by the demonstration of an R shiny dashboard app, which provided an intuitive interface to view the output of the developed algorithm. Of course, there is more work to be done. Expanding the algorithm to account for all 3 types of distributional change is the obvious next step. Furthermore, fitting a GAM to data for 760 pipes is not very efficient. Additional investigations into finding ways to ‘cluster’ groups of pipes together according to some notion of similarity is a natural avenue for future work in order to reduce the number of GAMS we need to fit.This experience enabled students to apply skills in statistical modelling, algorithm development, and software development to a salient problem faced by an industry partner and marked a successful start to the Compass industry focus lab series.

## Student perspectives: Three Days in the life of a Silicon Gorge Start-Up

A post by Mauro Camara Escudero, PhD student on the Compass programme.

Last December the first Compass cohort partook a 3-day entrepreneurship training with SpinUp Science. Keep reading and you might just find out if the Silicon Gorge life is for you!

### The Ambitious Project of SpinUp Science

SpinUp Science’s goal is to help PhD students like us develop an entrepreneurial skill-set that will come in handy if we decide to either commercialize a product, launch a start-up, or choose a consulting career.

I can already hear some of you murmur “Sure, this might be helpful for someone doing a much more applied PhD but my work is theoretical. How is that ever going to help me?”. I get that, I used to believe the same. However, partly thanks to this training, I changed my mind and realized just how valuable these skills are independently of whether you decide to stay in Academia or find a job at an established company.

Anyways, I am getting ahead of myself. Let me first guide you through what the training looked like and then we will come back to this!

### Day 1 – Meeting the Client

The day started with a presentation that, on paper, promised to be yet another one of those endless and boring talks that make you reach for the Stop Video button and take a nap. The vague title “Understanding the Opportunity” surely did not help either. Instead, we were thrown right into action!

Ric and Crescent, two consultants at SpinUp Science, introduced us to their online platform where we would be spending most of our time in the next few days. Our main task for the first half-hour was to read about the start-up’s goals and then write down a series of questions to ask the founders in order to get a full picture.

Before we knew it, it was time to get ready for the client meeting and split tasks. I volunteered as Client Handler, meaning I was going to coordinate our interaction with the founders. The rest of Compass split into groups focusing on different areas: some were going to ask questions about competitors, others about the start-up product, and so on.

As we waited in the ZOOM call, I kept wondering why on earth I volunteered for the role and my initial excitement was quickly turning into despair. We had never met the founders before, let alone had any experience consulting or working for a start-up.

Once the founders joined us, and after a wobbly start, it became clear that the hard part would not be avoiding awkward silences or struggling to get information. The real challenge was being able fit all of our questions in this one-hour meeting. One thing was clear: clients love to talk about their issues and to digress.

After the meeting, we had a brief chat and wrote down our findings and thoughts on the online platform. I wish I could say we knew what we were doing, but in reality it was a mix of extreme winging and following the advice of Ric and Crescent.

Last on the agenda, was a short presentation where we learned how to go about studying the market-fit for a product, judge its competitors and potential clients, and overall how to evaluate the success of a start-up idea. That was it for the day, but the following morning we would put into practice everything we had learned up to that point.

### Day 2 – Putting our Stalking Skills to good use

The start-up that we were consulting for provides data analysis software for power plants and was keen to expand in a new geographical area. Our goal for the day was therefore to:

• understand the need for such a product in the energy market

• research what options are available for the components of their product

• find potential competitors and assess their offering

• find potential clients and assess whether they already had a similar solution implemented

• study the market in the new geographical area

This was done with a mix of good-old Google searches and cold-calling. It was a very interesting process as in the morning we were even struggling to understand what the start-up was offering, while by late afternoon we had a fairly in-depth knowledge of all the points above and we had gathered enough information to formulate more sensible questions and to assess the feasibility of the start-up’s product. One of the things I found most striking about this supervised hands-on training is that as time went on I could clearly see how I was able to filter out useless information and go to the core of what I was researching.

To aid us in our analyses, we received training on various techniques to assess competitors, clients and the financial prospect of a start-up. In addition, we also learned about why the UK is such a good place to launch a start-up, what kind of funding is available and how to look for investors and angels.

Exhausted by a day of intense researching, we knew the most demanding moments were yet to come.

### Day 3 – Reporting to the Client

The final day was all geared towards preparing for our final client meeting. Ric and Crescent taught us how to use their online platform to perform PESTEL and SWOT analyses efficiently based on the insights that we gathered the day before. It was very powerful seeing a detailed report coming to life using inputs from all of our researches.

With the report in hand, several hours of training under our belt, and a clearer picture in our head, we joined the call and each one of us presented a different section of the report, while Andrea was orchestrating the interaction. Overall, the founders seemed quite impressed and admitted that had not heard of many of the competitors we had found. They were pleased by our in-depth research and, I am sure, found it very insightful.

### Lessons Learned

So, was it useful?

I believe that this training gave us a glimpse of how to go about picking up a totally new area of knowledge and quickly becoming an expert on it. The time constraint allowed us to refine the way in which we filter out useless information, to get to the core of what we are trying to learn about. We also worked together as a team towards a single goal and we formulated our opinion on the start-up. Finally, we had two invaluable opportunities to present in a real-world setting and to handle diplomatically the relationship with the client.

In the end, isn’t research all about being able to pickup new knowledge quickly, filter out useless papers, working together with other researchers to develop a method and present such results to an audience?

Find out more about Mauro Camara Escudero and his work on his profile page.