# New Opportunity: Funded PhD Project on Developing methods for model selection in causal health analyses

## How can statistical modelling tell us what causes disease? Electronic Health Records (EHR) have transformed medical research, with diverse examples including examining risks of emergency admissions on weekends vs weekdays, and health and psychological outcomes after COVID-19. The strengths of analyses based on EHR data include the very large number of individuals available for analysis and the extremely detailed data available for each individual.

The key obstacle to drawing causal conclusions from EHR analyses is the potential for confounding by common causes of the exposure or intervention and the outcome. For example, can we observe sufficiently many predictors of whether a person attends A&E on a weekday or a weekend, or of whether a person is infected or hospitalised with COVID-19? EHR data helps greatly with this, but high-dimensionality (high-n high-p), and misclassification of and missingness in some variables, make best use of these data for causal inference a challenging conceptual problem.

This PhD will develop an automated Bayesian system that can make decisions about which variables to include in an analysis. This requires using `big data’ in the form of a large and complex data source, plus priors for key variables, and taking into account missingness, measurement error, and the underlying causal diagram.

The key issues to consider, as well as the very large number of potential variables to include in a model, are:

1) balance of completeness vs precision (e.g. is it better to include a variable that is measured precisely for a small number of people, or one that is measured with more error for a larger number?)

2) retaining some variables as instruments for causal analysis via instrumental variables. If we suspect that distance from hospital can cause, but not be caused by, the choice of operation, we must not condition on it (e.g. https://pubmed.ncbi.nlm.nih.gov/27055764/).

3) comparison with variable selection models, such as LASSO penalisation, or backwards stepwise elimination.

The challenge involves both mathematical statistics and data science practice. No medical or pharmacological background is required, though students with such backgrounds will be considered if they are interested in developing methodology. Access to EHR and large cohorts will allow application of methods to important clinical questions about the impact of drug treatments and other interventions on disease outcomes. Data Science training is given by the EPSRC Centre for Doctoral Training (CDT) in Computational Statistics and Data Science (Compass), hosted in the Institute for Statistical Sciences at the University of Bristol School of Mathematics.

You will enroll as part of the 4-year Compass PhD training programme in the statistical and computational techniques of data science. This bespoke four-year PhD programme enables students to develop their knowledge and capability to pursue a cross-disciplinary PhD programme, under the supervision of internationally leading researchers, in a sector with a recognised skills gap.

The programme offers exciting collaborations across medicine, computer science, geography, economics, life and earth sciences, as well as with our external partners who range from government organisations such as the Office for National Statistics, GCHQ and the AWE, to industrial partners such as LV, Improbable, EDF, and Sparx.

Students are co-located with the Institute for Statistical Science in the School of Mathematics, which occupies the historic Fry Building, which has undergone a £35 million refurbishment.

### Project Supervisors

The PhD will be supervised by medical statisticians Professors Jonathan Sterne and Kate Tilling (Bristol), and Dr. Rhian Daniel (Cardiff). Prof Sterne has extensive expertise in causal inference methods and the analysis of EHR records, Prof Tilling in missing data, and Dr Daniel in causal inference theory and methods, particularly mediation and time-dependent confounding. The student will benefit from cross-institutional supervision, with access to seminars and events within Bristol and Cardiff. This will include the NIHR Bristol Biomedical Research Centre (Sterne), MRC Integrative Epidemiology Unit (Tilling) and Cardiff University’s Data Innovation Research Institute (DIRI), an active cross-disciplinary network of data scientists (Daniel).

### Funding and Application

This studentship provides four years of tuition fees at the UK student rate, a tax-free stipend of £19,609 per year (plus inflation) for living expenses and a travel allowance to support research related activities.

This project is open to UK/ Home fees students only. To be considered please apply before 7 January 2022 5.00pm. Please mention the Project Title in your application form to assist admissions tutors.

To find out more about research areas, entry requirements and how to apply, visit the Compass website or register your interest by contacting the Admissions Team on .