## Student Perspectives: Contemporary Ideas in Statistical Philosophy

A post by Alessio Zakaria, PhD student on the Compass programme.

## Introduction

Probability theory is a branch of mathematics centred around the abstract manipulation and quantification of uncertainty and variability. It forms a basic unit of the theory and practice of statistics, enabling us to tame the complex nature of observable phenomena into meaningful information. It is through this reliance that the debate over the true (or more correct) underlying nature of probability theory has profound effects on how statisticians do their work. The current opposing sides of the debate in question are the Frequentists and the Bayesians. Frequentists believe that probability is intrinsically linked to the numeric regularity with which events occur, i.e. their frequency. Bayesians, however, believe that probability is an expression of someones degree of belief or confidence in a certain claim. In everyday parlance we use both of these concepts interchangeably: I estimate one in five of people have Covid; I was 50% confident that the football was coming home. It should be noted that the latter of the two is not a repeatable event per se. We cannot roll back time to check what the repeatable sequence would result in.

## Student Perspectives: LV Datathon – Insurance Claim Prediction

A post by Doug Corbin, PhD student on the Compass programme.

In a recent bout of friendly competition, students from the Compass and Interactive AI CDT’s were divided into eight teams to take part in a two week Datathon, hosted by insurance provider LV. A Datathon is a short, intensive competition, posing a data-driven challenge to the teams. The challenge was to construct the best predictive model for (the size of) insurance claims, using an anonymised, artificial data set generously provided by LV. Each team’s solution was given three important criteria, on which their solutions would be judged:

• Accuracy – How well the solution performs at predicting insurance claims.
• Explainability – The ability to understand and explain how the solution calculates its predictions; It is important to be able to explain to a customers how their quote has been calculated.
• Creativity – The solution’s incorporation of new and unique ideas.

Students were given the opportunity to put their experience in Data Science and Artificial Intelligence to the test on something resembling real life data, forming cross-CDT relationships in the process.

# Data and Modelling

Before training a machine learning model, the data must first be processed into a numerical format. To achieve this, most teams transformed categorical features into a series of 0’s and 1’s (representing the value of the category), using a well known process called one-hot encoding. Others recognised that certain features had a natural order to them, and opted to map them to integers corresponding to their ordered position. (more…)

## Skills for Interdisciplinary Research

To acknowledge the variety of sectors where data science research is relevant, in March 2021, the Compass students are undertaking a series of workshops led by the Bristol Doctoral College to explore Skills for Interdisciplinary Research.  Using the Vitae framework for researcher development, our colleague at BDC will introduce Compass students to the following topics:

Workshop 1: What is a doctorate? A brief history of doctorates in the UK, how they have changed in the past two decades, why CDTs?, what skills are needed now for a doctorate?

Workshop 2: Interdisciplinarity – the foundations. A practical case study on interdisciplinary postgraduate research at Bristol.

Workshop 3: Ways of knowing, part 1 – Positivism and ‘ologies! Deconstructing some of the terminology around knowledge and how we know what we know. Underpinning assumption – to know your own discipline, you need to step outside of it and see it as others do.

Workshop 4: Ways of knowing, part 2 – Social constructionism and qualitative approaches to research. In part 1 of ways of knowing, the ideal ‘science’ approach is objective and the researcher is detached from the subject of study; looking at other approaches where the role of research is integral to the research.

Workshop 5: Becoming a good researcher – research integrity and doctoral students. A look at how dilemmas in research can show us how research integrity is not just a case of right or wrong.

Workshop 6: Getting started with academic publishing. An introduction on the scholarly publishing pressure in contemporary research and it explores what that means in an interdisciplinary context.

## Responsible Innovation in Data Science Research

This February our 2nd year Compass students will attend workshops in responsible innovation.

Run in partnership with the School of Management, the structured module constitutes Responsible Innovation training specifically for research in Data Science.

Taking the EPSRC AREA (Anticipate, Reflect, Engage, Act) framework for Responsible Innovation as it’s starting point, the module will take students through a guided process to develop the skills, knowledge and facilitated experience to incorporate the tenets of the AREA framework in to their PhD practice. Topics covered will include:
· Ethical and societal implications of data science and computational statistics
· Skills for anticipation
· Reflexivity for researchers
· Public perception of data science and engagement of publics
· Regulatory frameworks affecting data science

## Student perspectives: The Elo Rating System – From Chess to Education

A post by Andrea Becsek, PhD student on the Compass programme.

If you have recently also binge-watched the Queen’s Gambit chances are you have heard of the Elo Rating System. There are actually many games out there that require some way to rank players or even teams. However, the applications of the Elo Rating System reach further than you think.

### History and Applications

The Elo Rating System 1 was first suggested as a way to rank chess players, however, it can be used in any competitive two-player game that requires a ranking of its players. The system was first adopted by the World Chess Federation in 1970, and there have been various adjustments to it since, resulting in different implementations by each organisation.

For any soccer-lovers out there, the FIFA world rankings are also based on the Elo System, but if you happen to be into a similar sport, worry not, Elo has you covered. And the list of applications goes on and on: Backgammon, Scrabble, Go, Pokemon, and apparently even Tinder used it at some point.

Fun fact: The formulas used by the Elo Rating make a famous appearance in the Social Network, a movie about the creation of Facebook. Whether this was the actual algorithm used for FaceMash, the first version of Facebook, is however unclear.

All this sounds pretty cool, but how does it actually work?

### How it works

We want a way to rank players and update their ranking after each game. Let’s start by assuming that we have the ranking for the two players about to play: $\theta_i$ for player $i$ and $\theta_j$ for player $j$. Then we can compute the probability of player $i$ winning against player $j$ using the logistic function:

$P(Y_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-\theta_j)\}}.$

Given what we know about the logistic function, it’s easy to notice that the smaller the difference between the players, the less certain the outcome as the probability of winning will be close to $0.5$.

Once the outcome of the game is known, we can update both players’ abilities

$\theta_{i}:=\theta_{i}+K(Y_{ij}-P(Y_{ij}=1))$

$\theta_{j}:=\theta_{j}+K(P(Y_{ij}=1)-Y_{ij}).$

The $K$ factor controls the influence of a player’s performance on their previous ranking. For players with high rankings, a smaller $K$ is used because we expect their abilities to be somewhat stable and hence their ranking shouldn’t be too heavily influenced by every game. On the other hand, players with low ability can learn and improve quite quickly, and therefore their rating should be able to fluctuate more so they have a larger $K$ number.

The term in the brackets represents how different the actual outcome is from the expected outcome of the game. If a player is expected to win but doesn’t, their ranking will decrease, and vice versa. The larger the difference, the more their rating will change. For example, if a weaker player is highly unlikely to win, but they do, their ranking will be boosted quite a bit because it was a hard battle for them. On the other hand, if a strong player is really likely to win because they are playing against a weak player, their increase in score will be small as it was an easy win for them.

### Elo Rating and Education

As previously mentioned, the Elo Rating System has been used in a wide range of fields and, as it turns out, that includes education, more specifically, adaptive educational systems 2. Adaptive educational systems are concerned with automatically selecting adequate material for a student depending on their previous performance.

Note that a system can be adaptive at different levels of granularity. Some systems might adapt the homework from week to week by generating it based on the student’s current ability and update their ability once the homework has been completed. Whereas other systems are able to update the student ability after every single question. As you can imagine, using the second system requires a fairly fast, online algorithm. And this is where the Elo Rating comes in.

For an adaptive system to work, we need two key components: student abilities and question difficulties. To apply the Elo Rating to this context, we treat a student’s interaction with a question as a game where the student’s ranking represents their ability and the question’s ranking represents its difficulty. We can then predict whether a student of ability $\theta_i$ will answer a question of difficulty $d_j$ correctly using

$P(\text{correct}_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-d_j)\}}.$

and the ability and difficulty can be updated using

$\theta_{i}:=\theta_{i}+K(\text{correct}_{ij}-P(\text{correct}_{ij}=1))$

$d_{j}:=d_{j}+K(P(\text{correct}_{ij}=1)-\text{correct}_{ij}).$

So even if you only have $10$ minutes to create an adaptive educational system you can easily implement this algorithm. Set all abilities and question difficulties to $0$, let students answer your questions, and wait for the magic to happen. If you do have some prior knowledge about the difficulty of the items you could of course incorporate that into the initial values.

One important thing to note is that one should be careful with ranking students based on their abilities as this could result in various ethical issues. The main purpose of obtaining their abilities is to track their progress and match them with questions that are at the right level for them, easy enough to stay motivated, but hard enough to feel challenged.

### Conclusion

So is Elo the best option for an adaptive system? It depends. It is fast, enables on the fly updates, it’s easy to implement, and in some contexts, it even has a similar performance to more complex models. However, there are usually many other factors that can be relevant to predicting student performance, such as the time spent on a question or the number of hints they use. This additional data can be incorporated into more complex models, probably resulting in better predictions and offering much more insight. At the end of the day, there is always a trade-off, so depending on the context it’s up to you to decide whether the Elo Rating System is the way to go.

Find out more about Andrea Becsek and her work on her profile page.

1. Elo, A.E., 1978. The rating of chessplayers, past and present. Arco Pub.
2. Pelánek, R., 2016. Applications of the Elo rating system in adaptive educational systems. Computers & Education, 98, pp.169-179.

## Improbable sponsors Compass PhD student in new partnership

Improbable, a global technology company which provides innovative products and services to makers of virtual worlds and simulations, is sponsoring a PhD research project entitled Agent-based model calibration using likelihood-free inference.

The University of Bristol is announcing a new industrial sponsor of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. Improbable, a global technology company which provides innovative products and services to makers of virtual worlds and simulations, is sponsoring a PhD research project entitled Agent-based model calibration using likelihood-free inference. The project’s aim is to devise a general framework for calibrating agent-based models from training data by inferring the model parameters in a statistical framework.

## Sparx joins as Compass’ newest industrial partner

The University of Bristol is today announcing a new supporter of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. South West based learning technology company Sparx, has agreed to sponsor a PhD student’s research project which will investigate new approaches to longitudinal statistical modelling within school-based mathematics education.

Sparx, which is located in Exeter, develops maths learning tools to support teaching and learning in secondary education. As an evidence-led company, Sparx has invested heavily in researching how technology can support the teaching and learning of maths and worked closely with local schools. This new investment underlines their ongoing commitment to research.

## Compass students take part in electricity demand forecasting hackathon

Dr Jethro Browell, Research Fellow at the University of Strathclyde, and Dr Matteo Fasiolo, Lecturer at the University of Bristol, ran a regional electricity demand forecasting hackathon for students in the COMPASS Centre for Doctoral Training yesterday.

Visiting Research Fellow Dr Browell gave students an overview of how the Great Britain electricity transmission network has changed during the last decade, with particular focus on the consequences of the increased production from small-scale renewable sources, which appear as “negative demand”.

Dr Fasiolo then introduced a dataset containing electricity demand and weather-related variables, such as wind speed and solar irradiation, from 14 regions covering the whole of Great Britain. He proposed an initial forecasting solution based on a simple Generalized Additive Model (GAM), which he used to forecast the demand in each region.

The hackathon started, with the “Jim” team being the first to propose an improved solution, based on a more sophisticated GAM model, which beat the initial GAM in terms of forecasting accuracy.

The “AGang” team then produced an even more sophisticated GAM, which took them to the top of the ranking. In the meantime, the “D&D” team was struggling to make their random forest work, and submitted a couple of poor forecasts. Toward the end of event, “AGang” produced a couple of improved GAM solutions, which further strengthened their lead.

While Dr Fasiolo and Dr Browell were wrapping up the event and preparing to award the winners, the “D&D” team caught everyone by surprise by submitting a forecast which beat all others by a margin, in terms of forecasting accuracy. Their random forest was far better than the GAMs at predicting demand in Scotland, where wind production is an important factor and the dynamics are quite different relative to the other regions.

Congratulations to the top three teams:

1. D&D:  Doug Corbin and Dom Owens
2. AGang: Andrea Becsek, Alex Modell and Alessio Zakaria
3. Jim:  Michael Whitehouse, Daniel Williams and Jake Spiteri

Winning team “D&D” said:  “Given physical measurements, such as wind speeds and precipitation, as well as calendar data, we first performed a minor amount of feature engineering. Given the complex nature of the interactions between the variables, and large amount of data available, we opted to fit random forest models. These performed feature selection for us and provided some robustness from outlying observations.

“However, the models took a long time to fit. Despite parallelising the model fitting across the regions, we only just got our predictions in before the deadline. Thankfully, our model consistently outperformed the other approaches.

“Everyone taking part had a great time learning about the challenges of energy modelling, and we thrived under the pressure of friendly competition.”

Dr Browell added: “Computational statistics and data science is driving innovation in the energy sector and the technologies they enable will play a huge role in the decarbonisation. I was pleased to be able to expose the COMPASS cohort to this application and hope that they will be inspired to apply their expertise to energy and climate problems in the future.”

## Women and non-binary people in mathematics event

On 5 and 6 November the school hosted its annual Women and Non-Binary People in Mathematics event.

This two-day event held in the Fry Building, aimed at encouraging women and non-binary people to consider continuing their studies to PhD level, welcomed participants from around the country, as well as outside of the UK.

The event featured talks from mathematicians working both in universities and industry, giving insight into their current roles and their careers to date. As well as this it also offered ample time for discussion with other participants who are facing the same decisions, and with current PhD students who have recently faced the same questions.

The Tuesday afternoon kicked off with a talk from keynote speaker Delaram Kahrobaei from University of York, followed by a dinner, where participants were able to enjoy discussion and networking in a relaxed environment. Wednesday morning offered an insight into the life of a PhD student, with talks and Q&A sessions with current post-grad students. The final portion of the event was open to to all students and more broadly explored the nature of PhD research, the work environment, the application process, and career options. Talks were given by industry speaker, Katie Russell from OVO Energy and the University of Bristol’s Dr Lynne Walling.

Current PhD student and one of the event organisers, Emma Bailey said ‘My favourite part of the Women and Non-Binary conference is changing assumptions: showing people who maths PhD students are, what we do on a weekly basis, and all of the opportunities out there post-PhD’.

The event successfully illuminated some of the doors a PhD in mathematics can open and hopefully inspired potential PhD students to continue their future in mathematics.