• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

结合进化与蛋白质语言模型,利用D2Deep进行可解释的癌症驱动基因突变预测。

Combining evolution and protein language models for an interpretable cancer driver mutation prediction with D2Deep.

作者信息

Tzavella Konstantina, Diaz Adrian, Olsen Catharina, Vranken Wim

机构信息

Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium.

Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), Vrije Universiteit Brussel (VUB), Université Libre de Bruxelles (ULB), Laarbeeklaan 101, Brussels 1090, Belgium.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae664.

DOI:10.1093/bib/bbae664
PMID:39708841
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11663023/
Abstract

The mutations driving cancer are being increasingly exposed through tumor-specific genomic data. However, differentiating between cancer-causing driver mutations and random passenger mutations remains challenging. State-of-the-art homology-based predictors contain built-in biases and are often ill-suited to the intricacies of cancer biology. Protein language models have successfully addressed various biological problems but have not yet been tested on the challenging task of cancer driver mutation prediction at a large scale. Additionally, they often fail to offer result interpretation, hindering their effective use in clinical settings. The AI-based D2Deep method we introduce here addresses these challenges by combining two powerful elements: (i) a nonspecialized protein language model that captures the makeup of all protein sequences and (ii) protein-specific evolutionary information that encompasses functional requirements for a particular protein. D2Deep relies exclusively on sequence information, outperforms state-of-the-art predictors, and captures intricate epistatic changes throughout the protein caused by mutations. These epistatic changes correlate with known mutations in the clinical setting and can be used for the interpretation of results. The model is trained on a balanced, somatic training set and so effectively mitigates biases related to hotspot mutations compared to state-of-the-art techniques. The versatility of D2Deep is illustrated by its performance on non-cancer mutation prediction, where most variants still lack known consequences. D2Deep predictions and confidence scores are available via https://tumorscope.be/d2deep to help with clinical interpretation and mutation prioritization.

摘要

通过肿瘤特异性基因组数据,驱动癌症的突变正越来越多地被揭示出来。然而,区分致癌驱动突变和随机乘客突变仍然具有挑战性。基于同源性的先进预测器存在内在偏差,往往不适用于癌症生物学的复杂性。蛋白质语言模型已经成功解决了各种生物学问题,但尚未在大规模癌症驱动突变预测这一具有挑战性的任务上进行测试。此外,它们常常无法提供结果解释,阻碍了其在临床环境中的有效应用。我们在此介绍的基于人工智能的D2Deep方法通过结合两个强大的要素来应对这些挑战:(i)一个非专门的蛋白质语言模型,它捕捉所有蛋白质序列的组成;(ii)特定于蛋白质的进化信息,其中包含特定蛋白质的功能要求。D2Deep仅依赖序列信息,优于先进的预测器,并捕捉由突变引起的整个蛋白质中复杂的上位性变化。这些上位性变化与临床环境中已知的突变相关,可用于结果解释。该模型在一个平衡的体细胞训练集上进行训练,因此与先进技术相比,能有效减轻与热点突变相关的偏差。D2Deep在非癌症突变预测方面的表现说明了其通用性,在非癌症突变预测中,大多数变异的后果仍不明确。可通过https://tumorscope.be/d2deep获取D2Deep的预测结果和置信度分数,以帮助进行临床解释和突变优先级排序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/cfbfe7756a76/bbae664f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/b78a56393375/bbae664ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/4ef7f746dd64/bbae664f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/f5fa711d21ad/bbae664f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/2c53613ed26e/bbae664f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/b55725d5eb8a/bbae664f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/9b422056bd14/bbae664f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/cfbfe7756a76/bbae664f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/b78a56393375/bbae664ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/4ef7f746dd64/bbae664f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/f5fa711d21ad/bbae664f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/2c53613ed26e/bbae664f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/b55725d5eb8a/bbae664f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/9b422056bd14/bbae664f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8021/11663023/cfbfe7756a76/bbae664f6.jpg

相似文献

1
Combining evolution and protein language models for an interpretable cancer driver mutation prediction with D2Deep.结合进化与蛋白质语言模型,利用D2Deep进行可解释的癌症驱动基因突变预测。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae664.
2
Mut2Vec: distributed representation of cancerous mutations.Mut2Vec:癌性突变的分布式表示。
BMC Med Genomics. 2018 Apr 20;11(Suppl 2):33. doi: 10.1186/s12920-018-0349-7.
3
Predicting hotspots for disease-causing single nucleotide variants using sequences-based coevolution, network analysis, and machine learning.利用基于序列的共进化、网络分析和机器学习预测致病单核苷酸变异的热点。
PLoS One. 2024 May 14;19(5):e0302504. doi: 10.1371/journal.pone.0302504. eCollection 2024.
4
An evolution-based machine learning to identify cancer type-specific driver mutations.一种基于进化的机器学习方法,用于识别癌症类型特异性驱动突变。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac593.
5
CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations.CDMPred:一种用于预测具有高质量乘客突变的癌症驱动点突变的工具。
PeerJ. 2024 Sep 6;12:e17991. doi: 10.7717/peerj.17991. eCollection 2024.
6
Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models.使用转录因子特异性随机森林模型鉴定高影响顺式调控突变。
PLoS Comput Biol. 2015 Nov 12;11(11):e1004590. doi: 10.1371/journal.pcbi.1004590. eCollection 2015 Nov.
7
De novo discovery of mutated driver pathways in cancer.癌症中突变驱动途径的从头发现。
Genome Res. 2012 Feb;22(2):375-85. doi: 10.1101/gr.120477.111. Epub 2011 Jun 7.
8
An ensemble machine learning-based performance evaluation identifies top In-Silico pathogenicity prediction methods that best classify driver mutations in cancer.基于集成机器学习的性能评估确定了能够对癌症驱动突变进行最佳分类的顶级计算机模拟致病性预测方法。
BioData Min. 2025 Jan 20;18(1):7. doi: 10.1186/s13040-024-00420-x.
9
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
10
QuaDMutNetEx: a method for detecting cancer driver genes with low mutation frequency.QuaDMutNetEx:一种用于检测低突变频率癌症驱动基因的方法。
BMC Bioinformatics. 2020 Mar 23;21(1):122. doi: 10.1186/s12859-020-3449-2.

引用本文的文献

1
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

本文引用的文献

1
Accurate proteome-wide missense variant effect prediction with AlphaMissense.使用 AlphaMissense 进行精确的全蛋白质错义变异效应预测。
Science. 2023 Sep 22;381(6664):eadg7492. doi: 10.1126/science.adg7492.
2
Genome-wide prediction of disease variant effects with a deep protein language model.利用深度蛋白质语言模型进行全基因组疾病变异效应预测。
Nat Genet. 2023 Sep;55(9):1512-1522. doi: 10.1038/s41588-023-01465-0. Epub 2023 Aug 10.
3
A Novel TP53 Gene Mutation Sustains Non-Small Cell Lung Cancer through Mitophagy.一种新的 TP53 基因突变通过自噬维持非小细胞肺癌。
Cells. 2022 Nov 13;11(22):3587. doi: 10.3390/cells11223587.
4
Parental segregation study reveals rare benign and likely benign variants in a Brazilian cohort of rare diseases.父母分离研究揭示了巴西罕见疾病队列中的罕见良性和可能良性变异体。
Sci Rep. 2022 May 11;12(1):7764. doi: 10.1038/s41598-022-11932-z.
5
Learning protein fitness models from evolutionary and assay-labeled data.从进化和实验标记数据中学习蛋白质适应性模型。
Nat Biotechnol. 2022 Jul;40(7):1114-1122. doi: 10.1038/s41587-021-01146-5. Epub 2022 Jan 17.
6
Disease variant prediction with deep generative models of evolutionary data.利用进化数据的深度生成模型进行疾病变异预测。
Nature. 2021 Nov;599(7883):91-95. doi: 10.1038/s41586-021-04043-8. Epub 2021 Oct 27.
7
Informed training set design enables efficient machine learning-assisted directed protein evolution.知情训练集设计可实现高效的机器学习辅助定向蛋白质进化。
Cell Syst. 2021 Nov 17;12(11):1026-1045.e7. doi: 10.1016/j.cels.2021.07.008. Epub 2021 Aug 19.
8
Exploring amino acid functions in a deep mutational landscape.探索深度突变景观中的氨基酸功能。
Mol Syst Biol. 2021 Jul;17(7):e10305. doi: 10.15252/msb.202110305.
9
3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints.3Cnet:利用进化约束的多任务学习预测人类变异的致病性。
Bioinformatics. 2021 Dec 11;37(24):4626-4634. doi: 10.1093/bioinformatics/btab529.
10
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.