Suppr超能文献

人类蛋白质组的深度生成模型揭示了一百多个与罕见遗传疾病相关的新基因。

Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders.

作者信息

Orenbuch Rose, Kollasch Aaron W, Spinner Hansen D, Shearer Courtney A, Hopf Thomas A, Franceschi Dinko, Dias Mafalda, Frazer Jonathan, Marks Debora S

机构信息

Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.

Scientific Consulting, 85435 Erding, Germany.

出版信息

Res Sq. 2024 Jan 4:rs.3.rs-3740259. doi: 10.21203/rs.3.rs-3740259/v1.

Abstract

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

摘要

识别因果突变可加速遗传疾病的诊断和治疗发展。错义变体在基因诊断中构成了一个瓶颈,因为它们的影响不如截短突变或无义突变那么直接。虽然计算预测方法在预测疾病基因中的变体方面越来越成功,但由于这些分数在整个蛋白质组中未进行校准,所以它们对其他基因的通用性不佳。为了解决这个问题,我们开发了一种深度生成模型popEVE,它将进化信息与群体序列数据相结合,并在根据严重程度对变体进行排名以区分患有严重发育障碍的患者与潜在健康个体方面取得了领先水平的表现。popEVE在这个发育障碍队列的患者中识别出442个基因,包括123种新型遗传疾病的证据,其中许多无需基因水平的富集,也不会高估人群中致病变体的患病率。这些变体中的大多数在三维复合物中靠近相互作用伙伴。对儿童外显子组的初步分析表明,popEVE无需遗传标签就能识别候选变体。通过将变体置于统一的尺度上,我们的模型提供了一个关于整个蛋白质组和更广泛人群中适应性效应分布的全面视角。即使在依靠重复观察的传统技术可能不适用的极其罕见的单患者疾病中,popEVE也为基因诊断提供了令人信服的证据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验