## Skills for Interdisciplinary Research

To acknowledge the variety of sectors where data science research is relevant, in March 2021, the Compass students are undertaking a series of workshops led by the Bristol Doctoral College to explore Skills for Interdisciplinary Research.  Using the Vitae framework for researcher development, our colleague at BDC will introduce Compass students to the following topics:

Workshop 1: What is a doctorate? A brief history of doctorates in the UK, how they have changed in the past two decades, why CDTs?, what skills are needed now for a doctorate?

Workshop 2: Interdisciplinarity – the foundations. A practical case study on interdisciplinary postgraduate research at Bristol.

Workshop 3: Ways of knowing, part 1 – Positivism and ‘ologies! Deconstructing some of the terminology around knowledge and how we know what we know. Underpinning assumption – to know your own discipline, you need to step outside of it and see it as others do.

Workshop 4: Ways of knowing, part 2 – Social constructionism and qualitative approaches to research. In part 1 of ways of knowing, the ideal ‘science’ approach is objective and the researcher is detached from the subject of study; looking at other approaches where the role of research is integral to the research.

Workshop 5: Becoming a good researcher – research integrity and doctoral students. A look at how dilemmas in research can show us how research integrity is not just a case of right or wrong.

Workshop 6: Getting started with academic publishing. An introduction on the scholarly publishing pressure in contemporary research and it explores what that means in an interdisciplinary context.

## Student Research Topics for 2020/21

This month, the Cohort 2 Compass students have started work on their mini projects and are establishing the direction of their own research within the CDT.

## Supervised by the Institute for Statistical Science:

Anthony Stevenson will be working with Robert Allison on a project entitled Fast Bayesian Inference at Extreme Scale.  This project is in partnership with IBM Research.

Conor Crilly will be working with Oliver Johnson on a project entitled Statistical models for forecasting reliability. This project is in partnership with AWE.

Euan Enticott will be working with Matteo Fasiolo and Nick Whiteley on a project entitled Scalable Additive Models for Forecasting Electricity Demand and Renewable Production.  This project is in partnership with EDF.

Annie Gray will be working with Patrick Rubin-Delanchy and Nick Whiteley on a project entitled Exploratory data analysis of graph embeddings: exploiting manifold structure.

Ed Davis will be working with Dan Lawson and Patrick Rubin-Delanchy on a project entitled Graph embedding: time and space.  This project is in partnership with LV Insurance.

Conor Newton will be working with Henry Reeve and Ayalvadi Ganesh on a project entitled  Decentralised sequential decision making and learning.

## The following projects are supervised in collaboration with the Institute for Statistical Science (IfSS) and our other internal partners at the University of Bristol:

Dan Ward will be working with Matteo Fasiolo (IfSS) and Mark Beaumont from the School of Biological Sciences on a project entitled Agent-based model calibration via regression-based synthetic likelihood. This project is in partnership with Improbable

Jack Simons will be working with Song Liu (IfSS) and Mark Beaumont (Biological Sciences) on a project entitled Novel Approaches to Approximate Bayesian Inference.

Georgie Mansell will be working with Haeran Cho (IfSS) and Andrew Dowsey from the School of Population Health Sciences and Bristol Veterinary School on a project entitled Statistical learning of quantitative data at scale to redefine biomarker discovery.  This project is in partnership with Sciex.

Shannon Williams will be working with Anthony Lee (IfSS) and Jeremy Phillips from the School of Earth Sciences on a project entitled Use and Comparison of Stochastic Simulations and Weather Patterns in probabilistic volcanic ash hazard assessments.

Sam Stockman  will be working with Dan Lawson (IfSS) and Maximillian Werner from the School of Geographical Sciences on a project entitled Machine Learning and Point Processes for Insights into Earthquakes and Volcanoes

## Student perspectives: Three Days in the life of a Silicon Gorge Start-Up

A post by Mauro Camara Escudero, PhD student on the Compass programme.

Last December the first Compass cohort partook a 3-day entrepreneurship training with SpinUp Science. Keep reading and you might just find out if the Silicon Gorge life is for you!

### The Ambitious Project of SpinUp Science

SpinUp Science’s goal is to help PhD students like us develop an entrepreneurial skill-set that will come in handy if we decide to either commercialize a product, launch a start-up, or choose a consulting career.

I can already hear some of you murmur “Sure, this might be helpful for someone doing a much more applied PhD but my work is theoretical. How is that ever going to help me?”. I get that, I used to believe the same. However, partly thanks to this training, I changed my mind and realized just how valuable these skills are independently of whether you decide to stay in Academia or find a job at an established company.

Anyways, I am getting ahead of myself. Let me first guide you through what the training looked like and then we will come back to this!

### Day 1 – Meeting the Client

The day started with a presentation that, on paper, promised to be yet another one of those endless and boring talks that make you reach for the Stop Video button and take a nap. The vague title “Understanding the Opportunity” surely did not help either. Instead, we were thrown right into action! (more…)

## What to know before studying Data Science

by Dr Daniel Lawson, Senior Lecturer in Data Science, University of Bristol and Compass CDT Co-Director

For the first time in history, data is abundant and everywhere. This has created a new era for how we understand the world. Modern Data Science is new and changing the world, but it is rooted in cleverness throughout history.

## What is Data Science used for today?

Data Science is ubiquitous today. Many choices about what to buy, what to watch, what news to read – these are either directly or indirectly influenced by recommender systems that match our history with that of others to show us something we might want. Machine Learning has revolutionised computer vision, automation has revolutionised industry and distribution, whilst self-driving cars are at least close. Knowledge is increasingly distributed, with distributed learning ranging from Wikipedia to spam detection.

## Student perspectives: The Elo Rating System – From Chess to Education

A post by Andrea Becsek, PhD student on the Compass programme.

If you have recently also binge-watched the Queen’s Gambit chances are you have heard of the Elo Rating System. There are actually many games out there that require some way to rank players or even teams. However, the applications of the Elo Rating System reach further than you think.

### History and Applications

The Elo Rating System 1 was first suggested as a way to rank chess players, however, it can be used in any competitive two-player game that requires a ranking of its players. The system was first adopted by the World Chess Federation in 1970, and there have been various adjustments to it since, resulting in different implementations by each organisation.

For any soccer-lovers out there, the FIFA world rankings are also based on the Elo System, but if you happen to be into a similar sport, worry not, Elo has you covered. And the list of applications goes on and on: Backgammon, Scrabble, Go, Pokemon, and apparently even Tinder used it at some point.

Fun fact: The formulas used by the Elo Rating make a famous appearance in the Social Network, a movie about the creation of Facebook. Whether this was the actual algorithm used for FaceMash, the first version of Facebook, is however unclear.

All this sounds pretty cool, but how does it actually work?

### How it works

We want a way to rank players and update their ranking after each game. Let’s start by assuming that we have the ranking for the two players about to play: $\theta_i$ for player $i$ and $\theta_j$ for player $j$. Then we can compute the probability of player $i$ winning against player $j$ using the logistic function:

$P(Y_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-\theta_j)\}}.$

Given what we know about the logistic function, it’s easy to notice that the smaller the difference between the players, the less certain the outcome as the probability of winning will be close to $0.5$.

Once the outcome of the game is known, we can update both players’ abilities

$\theta_{i}:=\theta_{i}+K(Y_{ij}-P(Y_{ij}=1))$

$\theta_{j}:=\theta_{j}+K(P(Y_{ij}=1)-Y_{ij}).$

The $K$ factor controls the influence of a player’s performance on their previous ranking. For players with high rankings, a smaller $K$ is used because we expect their abilities to be somewhat stable and hence their ranking shouldn’t be too heavily influenced by every game. On the other hand, players with low ability can learn and improve quite quickly, and therefore their rating should be able to fluctuate more so they have a larger $K$ number.

The term in the brackets represents how different the actual outcome is from the expected outcome of the game. If a player is expected to win but doesn’t, their ranking will decrease, and vice versa. The larger the difference, the more their rating will change. For example, if a weaker player is highly unlikely to win, but they do, their ranking will be boosted quite a bit because it was a hard battle for them. On the other hand, if a strong player is really likely to win because they are playing against a weak player, their increase in score will be small as it was an easy win for them.

### Elo Rating and Education

As previously mentioned, the Elo Rating System has been used in a wide range of fields and, as it turns out, that includes education, more specifically, adaptive educational systems 2. Adaptive educational systems are concerned with automatically selecting adequate material for a student depending on their previous performance.

Note that a system can be adaptive at different levels of granularity. Some systems might adapt the homework from week to week by generating it based on the student’s current ability and update their ability once the homework has been completed. Whereas other systems are able to update the student ability after every single question. As you can imagine, using the second system requires a fairly fast, online algorithm. And this is where the Elo Rating comes in.

For an adaptive system to work, we need two key components: student abilities and question difficulties. To apply the Elo Rating to this context, we treat a student’s interaction with a question as a game where the student’s ranking represents their ability and the question’s ranking represents its difficulty. We can then predict whether a student of ability $\theta_i$ will answer a question of difficulty $d_j$ correctly using

$P(\text{correct}_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-d_j)\}}.$

and the ability and difficulty can be updated using

$\theta_{i}:=\theta_{i}+K(\text{correct}_{ij}-P(\text{correct}_{ij}=1))$

$d_{j}:=d_{j}+K(P(\text{correct}_{ij}=1)-\text{correct}_{ij}).$

So even if you only have $10$ minutes to create an adaptive educational system you can easily implement this algorithm. Set all abilities and question difficulties to $0$, let students answer your questions, and wait for the magic to happen. If you do have some prior knowledge about the difficulty of the items you could of course incorporate that into the initial values.

One important thing to note is that one should be careful with ranking students based on their abilities as this could result in various ethical issues. The main purpose of obtaining their abilities is to track their progress and match them with questions that are at the right level for them, easy enough to stay motivated, but hard enough to feel challenged.

### Conclusion

So is Elo the best option for an adaptive system? It depends. It is fast, enables on the fly updates, it’s easy to implement, and in some contexts, it even has a similar performance to more complex models. However, there are usually many other factors that can be relevant to predicting student performance, such as the time spent on a question or the number of hints they use. This additional data can be incorporated into more complex models, probably resulting in better predictions and offering much more insight. At the end of the day, there is always a trade-off, so depending on the context it’s up to you to decide whether the Elo Rating System is the way to go.

Find out more about Andrea Becsek and her work on her profile page.

1. Elo, A.E., 1978. The rating of chessplayers, past and present. Arco Pub.
2. Pelánek, R., 2016. Applications of the Elo rating system in adaptive educational systems. Computers & Education, 98, pp.169-179.

## Sparx joins as Compass’ newest industrial partner

The University of Bristol is today announcing a new supporter of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. South West based learning technology company Sparx, has agreed to sponsor a PhD student’s research project which will investigate new approaches to longitudinal statistical modelling within school-based mathematics education.

Sparx, which is located in Exeter, develops maths learning tools to support teaching and learning in secondary education. As an evidence-led company, Sparx has invested heavily in researching how technology can support the teaching and learning of maths and worked closely with local schools. This new investment underlines their ongoing commitment to research.