In June 2000, with the splendor of the great moments, the President of the United States Bill Clinton announced – together with the British Prime Minister Tony Blair – the completion of the complete sequencing of the human genome, the “genetic design of human beings humans”. A revolution was predicted in the knowledge of the genetic basis of the biological characteristics that define us, including diseases. Since that presentation, what has happened? What stage is personalized medicine at? And what does mathematics have to do with all this?
The strategy seemed clear. If until then it was known that a few genes affect certain aspects of our biology, the availability of the entire genome would make it possible to extend that knowledge to situations in which the characters were determined by numerous genes. The former are the so-called simple characters and the latter are complex. Equivalently, diseases determined by a few genes (or a single gene) are known as Mendelian (eg, cystic fibrosis), and as non-Mendelian, those related to many (eg, hypertension).
However, the sequence of the same gene can change in each person and this also modifies their characters (height, susceptibility to hypertension, etc.). The ideal, then, is not simply to match genes to characters, but to associate specific sequences –variants of the same gene– to their magnitude. If we obtain this relationship we will achieve two objectives. The first is that we would better understand the biological foundation of that property. The second is that we could predict it in those individuals who presented the specific sequence identified. Both aspects would contribute to the development of personalized medicine.
However, despite the cheaper sequencing (in the year 2000 it was about 300 million dollars; today, 1,000 dollars), sequencing the genome of many people – which is necessary to be able to make the sequence-to-character association – is still complicated . The GWAS experiments genome-wide association studies) offer an alternative: exclusively sequencing the regions of the genome in which the most frequent type of genetic variability is exhibited. These regions are called SNPs (from the English single nucleotide polymorphisms) and only contain one nucleotide, the basic constituent of the genome, which can present four different states, abbreviated G, A, T, C.
Variation in a SNP does not have to be the cause of the presence, or modification, of the corresponding biological property. SNPs act in the majority of cases as “markers” of the existence, physically close in the genome, of genetic variants that are the true causes. This is due to the “linkage” that exists between physically close sequences in the human genome, known as linkage disequilibrium.
Using special methodologies to examine these cheaper and easier regions, around a million SNPs can be evaluated per individual. However, the first works were unable to identify the sets of SNPs linked to the variability found in complex characters or in susceptibility to non-Mendelian diseases. To our amazement, it seems that most aspects of human biology are determined by far more SNPs and with much weaker influence than we expected. These SNPs also appear distributed throughout the genome.
On the other hand, from the sequence of the genome to the manifestation of the biological aspect, there are intermediate levels of molecular activity, which modulate the possible expression of this aspect, which further complicates the understanding of this relationship. The latter is known as the genotype-phenotype map problem.
And this is where mathematics comes in. The promotion of quantitative methods allows a better understanding of the association between sequence and biological character, incorporating information from the molecular and cellular context, in the form of genetic networks. For example, these techniques make it possible to identify SNPs whose variation is significantly associated with susceptibility to a disease. These tools include simple regression models and more complex methodologies that incorporate Bayesian estimation and, more recently, the use of deep neural networks and causal inference.
Regarding the second objective that we exposed, that of prediction, mathematics is used to develop systems that predict the value of a given character from information of the individual sequence. To this end, all the information of the available SNPs is being added, taking into account the intensity of their effect, in a single polygenic risk “predictor”. As their prognostic ability improves, many propose their use as independent biomarkers and for grading the severity of patients. However, they also have limitations: our understanding of the functioning of these predictors is very limited, given the tangled nature of the genotype-phenotype map described. Furthermore, its development depends on the specific population under examination (and on the environment-dependent interactions between genes) and is thus difficult to generalize.
Thus, polygenic risk predictors represent one more example –in genomics– of the challenges faced by other disciplines whose objective is quantitative prediction based on the so-called big data. These tools do their job, but we don’t really see why. Warren Weaver, one of the pioneers of information theory, highlighted in his writing on Science and Complexity (from 1947) that this type of challenge, which he called “organized” complexity, would be the ones that would dominate science and technology. future. Deepening them through the use of mathematics will undoubtedly mark the advancement and success of the coveted personalized medicine, but we should always keep in mind the inescapable restrictions imposed by complexity.
Juan F. Poyatos directs the Laboratory of Logic of Genomic Systems at the National Center for Biotechnology, integrated into the LifeHUB connection of the Superior Council of Scientific Investigations, and is a visiting researcher at the ICMAT.
Coffee and theorems is a section dedicated to mathematics and the environment in which it is created, coordinated by the Institute of Mathematical Sciences (ICMAT), in which the researchers and members of the center describe the latest advances in this discipline, share meeting points between the mathematics and other social and cultural expressions and remember those who marked its development and knew how to transform coffee into theorems. The name evokes the definition of the Hungarian mathematician Alfred Rényi: “A mathematician is a machine that transforms coffee into theorems.”
Editing and coordination: Ágata A. Timón G Longoria (ICMAT).
You can follow MATTER in Facebook, Twitter and Instagram, or sign up here to receive our weekly newsletter.