Suppr超能文献

跨蛋白迁移学习显著提高了疾病变异体预测的性能。

Cross-protein transfer learning substantially improves disease variant prediction.

机构信息

Computer Science Division, University of California, Berkeley, 94720, CA, USA.

Department of Statistics, University of California, Berkeley, 94720, CA, USA.

出版信息

Genome Biol. 2023 Aug 7;24(1):182. doi: 10.1186/s13059-023-03024-6.

Abstract

BACKGROUND

Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity.

RESULTS

We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes.

CONCLUSIONS

Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.

摘要

背景

人类基因组中的遗传变异是个体疾病风险的主要决定因素,但绝大多数错义变异的病因作用未知。在这里,我们提出了一种稳健的学习框架,利用饱和诱变实验构建准确的计算预测蛋白质组中错义变异致病性的方法。

结果

我们使用来自仅五个蛋白质的深度突变扫描(DMS)数据来训练跨蛋白质转移(CPT)模型,并在人类蛋白质组中针对未见过的蛋白质的临床变异解释方面达到了最新的性能。我们还提高了对保留蛋白质的 DMS 数据的预测准确性。高灵敏度对于临床应用至关重要,我们的模型 CPT-1 在这方面表现尤为出色。例如,在以 95%的灵敏度检测到 ClinVar 中注释的人类疾病变异时,CPT-1 将特异性提高到 68%,而 ESM-1v 的特异性为 27%,EVE 的特异性为 55%。此外,对于未用于训练 REVEL 的基因,我们展示了 CPT-1 与 REVEL 相比具有优势。我们的框架结合了从一般蛋白质序列模型、脊椎动物序列比对和 AlphaFold 结构中提取的预测特征,并且可以适应未来包括其他信息来源。我们发现,脊椎动物比对虽然只有 100 个基因组,非常浅,但为变异致病性预测提供了一个强有力的信号,这与基于大量蛋白质序列数据训练的最新深度学习模型互补。我们为 90%的人类基因中的所有可能的错义变异提供了预测。

结论

我们的结果证明了突变扫描数据在学习可转移到未见过的蛋白质的变异特性方面的实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15ac/10408151/f4a1aa1c5694/13059_2023_3024_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验