A post by Doug Corbin, PhD student on the Compass programme.
In a recent bout of friendly competition, students from the Compass and Interactive AI CDT’s were divided into eight teams to take part in a two week Datathon, hosted by insurance provider LV. A Datathon is a short, intensive competition, posing a data-driven challenge to the teams. The challenge was to construct the best predictive model for (the size of) insurance claims, using an anonymised, artificial data set generously provided by LV. Each team’s solution was given three important criteria, on which their solutions would be judged:
- Accuracy – How well the solution performs at predicting insurance claims.
- Explainability – The ability to understand and explain how the solution calculates its predictions; It is important to be able to explain to a customers how their quote has been calculated.
- Creativity – The solution’s incorporation of new and unique ideas.
Students were given the opportunity to put their experience in Data Science and Artificial Intelligence to the test on something resembling real life data, forming cross-CDT relationships in the process.
Data and Modelling
Before training a machine learning model, the data must first be processed into a numerical format. To achieve this, most teams transformed categorical features into a series of 0’s and 1’s (representing the value of the category), using a well known process called one-hot encoding. Others recognised that certain features had a natural order to them, and opted to map them to integers corresponding to their ordered position. (more…)