Jänes Jürgen, Müller Marc, Selvaraj Senthil, Manoel Diogo, Stephenson James, Gonçalves Catarina, Lafita Aleix, Polacco Benjamin, Obernier Kirsten, Alasoo Kaur, Lemos Manuel C, Krogan Nevan, Martin Maria, Saraiva Luis R, Burke David, Beltrao Pedro
Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
Swiss Institute of Bioinformatics, Lausanne, Switzerland.
bioRxiv. 2024 May 29:2024.05.29.596373. doi: 10.1101/2024.05.29.596373.
Genome sequencing efforts have led to the discovery of tens of millions of protein missense variants found in the human population with the majority of these having no annotated role and some likely contributing to trait variation and disease. Sequence-based artificial intelligence approaches have become highly accurate at predicting variants that are detrimental to the function of proteins but they do not inform on mechanisms of disruption. Here we combined sequence and structure-based methods to perform proteome-wide prediction of deleterious variants with information on their impact on protein stability, protein-protein interactions and small-molecule binding pockets. AlphaFold2 structures were used to predict approximately 100,000 small-molecule binding pockets and stability changes for over 200 million variants. To inform on protein-protein interfaces we used AlphaFold2 to predict structures for nearly 500,000 protein complexes. We illustrate the value of mechanism-aware variant effect predictions to study the relation between protein stability and abundance and the structural properties of interfaces underlying protein quantitative trait loci (pQTLs). We characterised the distribution of mechanistic impacts of protein variants found in patients and experimentally studied example disease linked variants in FGFR1.
基因组测序工作已促使人们在人类群体中发现了数千万个蛋白质错义变体,其中大多数没有注释作用,有些可能导致性状变异和疾病。基于序列的人工智能方法在预测对蛋白质功能有害的变体方面已变得高度准确,但它们无法提供破坏机制的信息。在此,我们结合基于序列和结构的方法,对有害变体进行全蛋白质组预测,并提供有关它们对蛋白质稳定性、蛋白质-蛋白质相互作用和小分子结合口袋影响的信息。使用AlphaFold2结构预测了约10万个小分子结合口袋以及超过2亿个变体的稳定性变化。为了了解蛋白质-蛋白质界面,我们使用AlphaFold2预测了近50万个蛋白质复合物的结构。我们阐述了机制感知变体效应预测在研究蛋白质稳定性与丰度之间的关系以及蛋白质数量性状位点(pQTL)潜在界面的结构特性方面的价值。我们对患者中发现的蛋白质变体的机制影响分布进行了表征,并通过实验研究了FGFR1中与疾病相关的示例变体。