Student perspectives: Genetic Boolean Models – How to Make One

A post by Daniel Gardner, PhD student on the Compass programme.

Introduction

My research focuses on genetic interaction networks within lung cancer cells. Our (long-term) aim is to model such networks dynamically using a Boolean modelling framework, and then use this to tie changes in cancer cells’ physiology to certain, often mutated, genes of interest.

Aims and problems

This blog post will focus on the challenge we are currently working on: constructing the model itself. This is often the most challenging element of the research, as it underpins all results going forward, and often there does not exist enough data to fully define a unique model.

In some respects this is acceptable, as Boolean modelling is more of a qualitative approach. Each node in the network is a ‘species’, be that a gene, protein, small molecule, etc. Each directed, labelled edge is either ‘activating’, if an increase in species A causes an increase in species B, or ‘inhibiting’, if the opposite is the case [1].

With this definition, a lot of papers we have looked at define their model purely from the literature [2], [3], [4], either manually mining links, or using pre-existing databases like the Kyoto Encyclopedia of Genes and Genomes (KEGG)[6].

What we are more interested in are methods deriving these models in a far more quantitative way, straight from transcriptomic data. Whilst some of the papers referenced above justify their hand-built models in retrospect by showing they can replicate real-world results [3], we wish to work the other way round – beginning at the real-world results and then using a reverse engineering approach.

Figure 1: The Boolean model used in [2], based off a similar model constructed in [4]. It contains 98 nodes (species) and 254 directed edges (labelled interactions).

Potential solutions

The solutions we have found can be broadly split into two categories: methods that go from:

Raw Data → Interaction Network

and similarly:

Interaction Network → Boolean Model

The former is a much more difficult challenge. Generally, in a published network, each edge will reference experimental work that justifies, e.g. ‘A activates B’. However, data-frames which contain many cell-line perturbation experiments in one are hard to come by, and expensive to perform [5]. The problem is often also undetermined since the solution-space for a potential network is far greater than the amount of data available. One option we may look into in the future, however, involves using other modelling techniques, such as ODEs or Bayesian networks.

The challenge of reverse engineering a Boolean network from a pre-built network is much more feasible. The main problem in this case is considering complex interactions. For example, if we had ’A inhibits C’ and ’B activates C’, how do they work in tandem?

Figure 2: Part of the optimisation algorithm from [7] applied to a toy model. In D, we classify each species in the network. All non-compressed nodes are those which we have data to train on. In E, we construct the hypergraph, where for any pair of combined interactions, both the ‘AND’ and ‘OR’ case are considered.

Sticking to the Boolean framework, these two interactions can either be joined through an ‘AND’ relation, or an ‘OR’ relation. For several proteins affecting one specific protein, the combinations of Boolean rules are non-trivial.

One paper we found that deals with this problem well is Saez-Rodriguez et al. [7], which attempts to train a hypergraph of the interaction network to cell line assay data. It contains a number of different techniques to do with graph and state space reduction, as well as some heuristic rules on which complex interactions to target. For example, it is unlikely in biology for a protein to require multiple other species to necessitate a change in function, so we can remove ‘AND’ links of more than N complex interactions from the state space.

One other model component we are looking for, which we have not currently looked into properly, is a ‘layered’ model, which includes different levels of genomic interaction. For example, many papers we have read use ‘protein interaction network (PIN)’ and ‘gene regulatory network (GRN)’ interchangeably. Whilst the two are greatly related, drawing a one-to-one equivalence between the two in all cases is incorrect.

Conclusion and future plans

Starting directly from data to build a network is perhaps too ambitious a challenge, especially with the limited data available. In fact, even to train a Boolean network for optimisation requires quite specific cell-line perturbation data. It could be that we make do with a network partially trained on limited data, and the rest taken from prior knowledge in the literature.

One promising sign is that [7] finds that it is best to begin with ’too many’ interactions in a literature-curated interaction network, and then ’prune’ spurious interactions via network optimisation. This is due to these large networks being built from many different sources, some using different tissue, conditions, etc. Therefore, when we desire a model specific to lung adenocarcinoma data, it is natural for the training to remove many of these genetic interactions.

In the future, we aim for this research topic to simply be one section of the wider project. Once we decide upon the most justified Boolean model for lung cancer, we aim to use patient mRNA and mutation data to personalise the models, in order to predict patient specific cell phenotype probabilities. Using this, along with multi-layer protein imaging data from Cancer Research UK, we aim to find a statistically significant link between certain gene mutations, and the resulting shape and, therefore, phenotype of a tumour of cancer cells.

Thank you for reading this blog post. If you have any questions, please feel free to get in touch with me at: daniel.gardner@bristol.ac.uk

References

[1] Abou-Jaoudé, W., Traynard, P., Monteiro, P. T., Saez- Rodrıguez, J., Helikar, T., Thieffry, D., and Chaouiya, C. (2016). Logical modeling and dynamical analysis of cellular networks. Frontiers in Genetics, 7.

[2] Béal, J., Montagud, A., Traynard, P., Barillot, E., and Calzone, L. (2019). Personalization of logical models with multi-omics data allows clinical stratification of patients. Frontiers in Physiology, 9.

[3] Cohen, D. P. A., Martignetti, L., Robine, S., Barillot, E., Zinovyev, A., and Calzone, L. (2015). Mathematical modelling of molecular pathways enabling tumour cell invasion and migration. PLOS Computational Biology, 11.

[4] Fumiã, H. (2013). Boolean network model for cancer pathways: Predicting carcinogenesis and targeted therapy outcomes. PloS one, 8:e69008.

[5] Galindez, G., Sadegh, S., Baumbach, J., Kacprowski, T., and List, M. (2023). Network-based approaches for modeling disease regulation and progression. Computational and Structural Biotechnology Journal, 21:780–795. 4

[6] Kanehisa, M. and Goto, S. (2000). Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28(1):27–30.

[7] Saez-Rodriguez, J., Alexopoulos, L. G., Epperlein, J., Samaga, R., Lauffenburger, D. A., Klamt, S., and Sorger, P. K. (2009). Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Molecular Systems Biology, 5(1):331.

Skip to toolbar