## Novel semi-supervised Bayesian learning to rapidly screen new oligonucleotide drugs for impurities.

This is an exciting opportunity to join Compass’ 4-year programme with integrated training in the statistical and computational techniques of Data Science. You will be part of a dynamic cohort of PhD researchers hosted in the historic Fry Building, which has recently undergone a £35 million refurbishment as the new home for Bristol’s School of Mathematics.

This fully-funded 4 year studentship covers:

• tuition fees at UK rate
• tax-free stipend of up £19,609 per year for living expenses and
• equipment and travel allowance to support research related activities.

This opportunity is open to UK, EU, and international students.

AstraZeneca is a global, science-led biopharmaceutical business whose innovative medicines are used by millions of patients worldwide. Oligonucleotide-based therapies are advanced novel interventions with the potential to provide a step-change in treatment for many. Nevertheless, as oligonucleotides are large complex molecules they are currently very difficult to profile for impurities, as the analysis is labour intensive and the data complexity is high.

The aim of this PhD is to develop Bayesian data science methodology that does this automatically, accurately, and delivers statistical measures of certainty. The challenge is a mathematical one, and no chemistry, biology or pharmacological background is expected of the student. More specifically, we have large batches of mass spectrometry data that will enable us to learn how to characterise the known oglionucleotide signal and deconvolute it from a number of known and unknown impurities longitudinally, in a semi-supervised learning framework. This will allow us to confirm the overall consistency of the profile, identify any change patterns, trends over batches, and any correlation between impurities.

The end goal is to establish a data analytics pipeline and embed it as part of routine analysis in AstraZeneca, so impurities can be monitored more closely and more precisely. The knowledge can then be used to identify possible issues in manufacturing and improve process chemistry by pinpointing impurities associated with different steps of the drug synthesis. This project would also improve the overall understanding of oligonucleotides and therefore, serve as a key step towards establishing an advanced analytical platform.

### Project Supervisor

The PhD will be supervised by statistical data scientist Prof Andrew Dowsey at Bristol in collaboration with AstraZeneca. Prof Dowsey’s group has extensive expertise and experience in Bayesian mass spectrometry analytics (e.g. Nature Comms Biology 2019Nature Scientific Reports 2016) and leads the development of the seaMass suite of tools for quantification and statistical analyses in mass spectrometry.

Application Deadline is 5.00pm Friday 18 June 2021. Please quote ‘Compass/ AstraZeneca’ in the funding section of the application form and in your Personal Statement to ensure your application is reviewed correctly. Please follow the Compass application guidance.

Interviews are expected to be held in the week commencing 12 July.

## Student perspectives: The Elo Rating System – From Chess to Education

A post by Andrea Becsek, PhD student on the Compass programme.

If you have recently also binge-watched the Queen’s Gambit chances are you have heard of the Elo Rating System. There are actually many games out there that require some way to rank players or even teams. However, the applications of the Elo Rating System reach further than you think.

### History and Applications

The Elo Rating System 1 was first suggested as a way to rank chess players, however, it can be used in any competitive two-player game that requires a ranking of its players. The system was first adopted by the World Chess Federation in 1970, and there have been various adjustments to it since, resulting in different implementations by each organisation.

For any soccer-lovers out there, the FIFA world rankings are also based on the Elo System, but if you happen to be into a similar sport, worry not, Elo has you covered. And the list of applications goes on and on: Backgammon, Scrabble, Go, Pokemon, and apparently even Tinder used it at some point.

Fun fact: The formulas used by the Elo Rating make a famous appearance in the Social Network, a movie about the creation of Facebook. Whether this was the actual algorithm used for FaceMash, the first version of Facebook, is however unclear.

All this sounds pretty cool, but how does it actually work?

### How it works

We want a way to rank players and update their ranking after each game. Let’s start by assuming that we have the ranking for the two players about to play: $\theta_i$ for player $i$ and $\theta_j$ for player $j$. Then we can compute the probability of player $i$ winning against player $j$ using the logistic function:

$P(Y_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-\theta_j)\}}.$

Given what we know about the logistic function, it’s easy to notice that the smaller the difference between the players, the less certain the outcome as the probability of winning will be close to $0.5$.

Once the outcome of the game is known, we can update both players’ abilities

$\theta_{i}:=\theta_{i}+K(Y_{ij}-P(Y_{ij}=1))$

$\theta_{j}:=\theta_{j}+K(P(Y_{ij}=1)-Y_{ij}).$

The $K$ factor controls the influence of a player’s performance on their previous ranking. For players with high rankings, a smaller $K$ is used because we expect their abilities to be somewhat stable and hence their ranking shouldn’t be too heavily influenced by every game. On the other hand, players with low ability can learn and improve quite quickly, and therefore their rating should be able to fluctuate more so they have a larger $K$ number.

The term in the brackets represents how different the actual outcome is from the expected outcome of the game. If a player is expected to win but doesn’t, their ranking will decrease, and vice versa. The larger the difference, the more their rating will change. For example, if a weaker player is highly unlikely to win, but they do, their ranking will be boosted quite a bit because it was a hard battle for them. On the other hand, if a strong player is really likely to win because they are playing against a weak player, their increase in score will be small as it was an easy win for them.

### Elo Rating and Education

As previously mentioned, the Elo Rating System has been used in a wide range of fields and, as it turns out, that includes education, more specifically, adaptive educational systems 2. Adaptive educational systems are concerned with automatically selecting adequate material for a student depending on their previous performance.

Note that a system can be adaptive at different levels of granularity. Some systems might adapt the homework from week to week by generating it based on the student’s current ability and update their ability once the homework has been completed. Whereas other systems are able to update the student ability after every single question. As you can imagine, using the second system requires a fairly fast, online algorithm. And this is where the Elo Rating comes in.

For an adaptive system to work, we need two key components: student abilities and question difficulties. To apply the Elo Rating to this context, we treat a student’s interaction with a question as a game where the student’s ranking represents their ability and the question’s ranking represents its difficulty. We can then predict whether a student of ability $\theta_i$ will answer a question of difficulty $d_j$ correctly using

$P(\text{correct}_{ij}=1)=\frac{1}{1+\exp\{-(\theta_i-d_j)\}}.$

and the ability and difficulty can be updated using

$\theta_{i}:=\theta_{i}+K(\text{correct}_{ij}-P(\text{correct}_{ij}=1))$

$d_{j}:=d_{j}+K(P(\text{correct}_{ij}=1)-\text{correct}_{ij}).$

So even if you only have $10$ minutes to create an adaptive educational system you can easily implement this algorithm. Set all abilities and question difficulties to $0$, let students answer your questions, and wait for the magic to happen. If you do have some prior knowledge about the difficulty of the items you could of course incorporate that into the initial values.

One important thing to note is that one should be careful with ranking students based on their abilities as this could result in various ethical issues. The main purpose of obtaining their abilities is to track their progress and match them with questions that are at the right level for them, easy enough to stay motivated, but hard enough to feel challenged.

### Conclusion

So is Elo the best option for an adaptive system? It depends. It is fast, enables on the fly updates, it’s easy to implement, and in some contexts, it even has a similar performance to more complex models. However, there are usually many other factors that can be relevant to predicting student performance, such as the time spent on a question or the number of hints they use. This additional data can be incorporated into more complex models, probably resulting in better predictions and offering much more insight. At the end of the day, there is always a trade-off, so depending on the context it’s up to you to decide whether the Elo Rating System is the way to go.

Find out more about Andrea Becsek and her work on her profile page.

1. Elo, A.E., 1978. The rating of chessplayers, past and present. Arco Pub.
2. Pelánek, R., 2016. Applications of the Elo rating system in adaptive educational systems. Computers & Education, 98, pp.169-179.

The University of Bristol is excited to announce IBM Research Europe as a new partner of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. IBM scientists are collaborating with Prof. Robert Allison and Compass PhD student Anthony Stephenson, on a research project entitled Fast Bayesian Inference at Extreme Scale. The project’s aim is to extend Bayesian inference algorithms to the ‘extreme scales’ that many deep learning workloads occupy, by placing more focus on AI methodologies which furnish both an accurate prediction, and critically, a high-quality uncertainty representation for predictions.

For more than seven decades, IBM Research has defined the future of information technology with more than 3,000 researchers in 19 locations across six continents. Scientists from IBM Research have produced six Nobel Laureates, 10 U.S. National Medals of Technology, five U.S. National Medals of Science, six Turing Awards, 19 inductees in the National Academy of Sciences and 20 inductees into the U.S. National Inventors Hall of Fame

IBM has European research locations in Switzerland (Zurich), England (Hursley and Daresbury), and Ireland (Dublin), with a large development lab in Germany focused on AI, quantum computing, security and hybrid cloud.

IBM’s global labs are involved hundreds of joint projects with universities particularly throughout Europe, in research programs established by the European Union and the local governments, and in cooperation agreements with research institutes of industrial partners.

Compass is a 4-year PhD training programme focusing on Computational Statistics and Data Science. This new venture is part of the Compass mission to promote academic and professional agility in its students, equipping them with the skills and experience to work across disciplines in academia and beyond.

Anthony Stephenson is the PhD student recruited to this project says, “After several years working in industry, I am pleased to be starting the Compass programme and shifting my focus to research. Having the combined forces of the University of Bristol and IBM behind me inspires confidence and I look forward to working with members of each of them. My project, scalable inference in non-linear Bayesian models, is also a highly relevant and exciting area to work on, with many applications in modern machine learning.”

Dr Ed Pyzer-Knapp is World-Wide IBM Research Lead in AI Enriched Modelling and Simulation and says, “I am very excited to work with Anthony and Robert – scaling Bayesian inference is a really important area of machine learning research; bringing to bear our mantra of fusing of bits and neurons to further develop the future of computing. This project is a great opportunity to further strengthen our relationship with the University of Bristol.”

Prof Robert Allison is Anthony’s academic supervisor at the University of Bristol and says, “I’m really looking forward to working with Anthony and Ed on a highly important and widely applicable area of machine learning which encompasses mathematical research, data-analysis, algorithm development and efficient large-scale computation. In addition, I see this project as an ideal opportunity to seed wider ranging data-science and machine learning collaborations between IBM Research, their academic partners and the University of Bristol.”

As Director of Compass, Prof Nick Whiteley say “I’m absolutely delighted to welcome IBM Research to Compass. This project is a fantastic opportunity for Anthony to tackle a very challenging and increasingly important AI research problem under Prof. Allison and Dr. Pyzer-Knapp’s supervision. As this collaboration develops, I look forward to all Compass students learning about IBM’s vision for the future of AI and its connection to the expertise in statistical methodology and computing they will acquire through the Compass training programme.”

## Improbable sponsors Compass PhD student in new partnership

Improbable, a global technology company which provides innovative products and services to makers of virtual worlds and simulations, is sponsoring a PhD research project entitled Agent-based model calibration using likelihood-free inference.

The University of Bristol is announcing a new industrial sponsor of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. Improbable, a global technology company which provides innovative products and services to makers of virtual worlds and simulations, is sponsoring a PhD research project entitled Agent-based model calibration using likelihood-free inference. The project’s aim is to devise a general framework for calibrating agent-based models from training data by inferring the model parameters in a statistical framework.

## Sparx joins as Compass’ newest industrial partner

The University of Bristol is today announcing a new supporter of Compass – the EPSRC Centre for Doctoral Training in Computational Statistics and Data Science. South West based learning technology company Sparx, has agreed to sponsor a PhD student’s research project which will investigate new approaches to longitudinal statistical modelling within school-based mathematics education.

Sparx, which is located in Exeter, develops maths learning tools to support teaching and learning in secondary education. As an evidence-led company, Sparx has invested heavily in researching how technology can support the teaching and learning of maths and worked closely with local schools. This new investment underlines their ongoing commitment to research.