Randall Centre for Cell and Molecular Biophysics, King's College London, London, United Kingdom.
PLoS Biol. 2021 Apr 28;19(4):e3001207. doi: 10.1371/journal.pbio.3001207. eCollection 2021 Apr.
Missense variants are present amongst the healthy population, but some of them are causative of human diseases. A classification of variants associated with "healthy" or "diseased" states is therefore not always straightforward. A deeper understanding of the nature of missense variants in health and disease, the cellular processes they may affect, and the general molecular principles which underlie these differences is essential to offer mechanistic explanations of the true impact of pathogenic variants. Here, we have formalised a statistical framework which enables robust probabilistic quantification of variant enrichment across full-length proteins, their domains, and 3D structure-defined regions. Using this framework, we validate and extend previously reported trends of variant enrichment in different protein structural regions (surface/core/interface). By examining the association of variant enrichment with available functional pathways and transcriptomic and proteomic (protein half-life, thermal stability, abundance) data, we have mined a rich set of molecular features which distinguish between pathogenic and population variants: Pathogenic variants mainly affect proteins involved in cell proliferation and nucleotide processing and are enriched in more abundant proteins. Additionally, rare population variants display features closer to common than pathogenic variants. We validate the association between these molecular features and variant pathogenicity by comparing against existing in silico variant impact annotations. This study provides molecular details into how different proteins exhibit resilience and/or sensitivity towards missense variants and provides the rationale to prioritise variant-enriched proteins and protein domains for therapeutic targeting and development. The ZoomVar database, which we created for this study, is available at fraternalilab.kcl.ac.uk/ZoomVar. It allows users to programmatically annotate missense variants with protein structural information and to calculate variant enrichment in different protein structural regions.
错义变异存在于健康人群中,但其中一些会导致人类疾病。因此,将与“健康”或“疾病”状态相关的变异进行分类并不总是那么简单。深入了解健康和疾病中错义变异的性质、它们可能影响的细胞过程以及这些差异所依据的一般分子原理,对于提供对致病性变异真正影响的机制解释至关重要。在这里,我们建立了一个统计框架,可以对全长蛋白质、它们的结构域和 3D 结构定义区域中的变异富集进行稳健的概率量化。使用这个框架,我们验证并扩展了以前在不同蛋白质结构区域(表面/核心/界面)中报道的变异富集趋势。通过检查变异富集与可用功能途径以及转录组和蛋白质组(蛋白质半衰期、热稳定性、丰度)数据的关联,我们挖掘出了一组丰富的分子特征,这些特征可区分致病性变异和人群变异:致病性变异主要影响与细胞增殖和核苷酸处理相关的蛋白质,并且在更丰富的蛋白质中富集。此外,罕见的人群变异显示出与常见变异更接近的特征,而不是致病性变异。我们通过与现有的计算变异影响注释进行比较,验证了这些分子特征与变异致病性之间的关联。这项研究深入了解了不同蛋白质如何对错义变异表现出弹性和/或敏感性,并为优先考虑富含变异的蛋白质和蛋白质结构域用于治疗靶向和开发提供了依据。我们为此研究创建的 ZoomVar 数据库可在 fraternalilab.kcl.ac.uk/ZoomVar 上获得。它允许用户使用蛋白质结构信息对错义变异进行程序注释,并计算不同蛋白质结构区域中的变异富集。