Suppr超能文献

利用深度蛋白质语言模型进行全基因组疾病变异效应预测。

Genome-wide prediction of disease variant effects with a deep protein language model.

机构信息

Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.

Biological and Medical Informatics Graduate Program, University of California, San Francisco, San Francisco, CA, USA.

出版信息

Nat Genet. 2023 Sep;55(9):1512-1522. doi: 10.1038/s41588-023-01465-0. Epub 2023 Aug 10.

Abstract

Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in specific protein isoforms, demonstrating the importance of considering all isoforms when predicting variant effects. Our approach also generalizes to more complex coding variants such as in-frame indels and stop-gains. Together, these results establish protein language models as an effective, accurate and general approach to predicting variant effects.

摘要

预测编码变异的影响是一项重大挑战。虽然最近的深度学习模型提高了变异影响预测的准确性,但由于依赖于密切同源物或软件限制,它们无法分析所有编码变异。在这里,我们开发了一个使用 ESM1b 的工作流程,ESM1b 是一个 6.5 亿参数的蛋白质语言模型,用于预测人类基因组中所有约 4.5 亿种可能的错义变异的影响,并在一个网络门户上提供了所有的预测结果。ESM1b 在对约 150,000 个 ClinVar/HGMD 错义变异进行致病性或良性分类以及预测 28 个深度突变扫描数据集的测量方面表现优于现有方法。我们进一步将约 200 万个变体注释为仅在特定蛋白质亚型中具有破坏性,这表明在预测变体影响时考虑所有亚型的重要性。我们的方法也可推广到更复杂的编码变异,如框内缺失和终止增益。总之,这些结果确立了蛋白质语言模型作为预测变异影响的有效、准确和通用的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a23/10484790/ba8e842053cd/41588_2023_1465_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验