• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CDMPred:一种用于预测具有高质量乘客突变的癌症驱动点突变的工具。

CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations.

机构信息

Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China.

School of Information Engineering, Huangshan University, Huangshan, Anhui, China.

出版信息

PeerJ. 2024 Sep 6;12:e17991. doi: 10.7717/peerj.17991. eCollection 2024.

DOI:10.7717/peerj.17991
PMID:39253604
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11382650/
Abstract

Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred's prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.

摘要

大多数用于预测驱动突变的计算方法都是使用阳性样本进行训练的,而阴性样本通常来自统计方法或假定的样本。这些阴性样本在捕捉乘客突变多样性方面的代表性仍有待确定。为了解决这些问题,我们从 COSMIC 数据库中收集了一个包含驱动突变的平衡数据集,并从 Cancer Passenger Mutation 数据库中获得了高质量的乘客突变。随后,我们对这些突变的特征进行了编码。利用特征相关性分析,我们开发了一种名为 CDMPred 的癌症驱动突变错义预测器,该预测器采用集成学习技术 XGBoost 通过特征选择。在所提出的 CDMPred 方法中,利用前 10 个特征和 XGBoost,在训练集和独立测试集上的接收者操作特征曲线(AUC)值分别为 0.83 和 0.80。此外,CDMPred 在 AUC 和精度-召回曲线下面积方面的表现优于现有的癌症特异性和一般疾病的最先进方法。在训练数据中包含高质量的乘客突变对 CDMPred 的预测性能有利。我们预计 CDMPred 将成为预测癌症驱动突变的有价值的工具,进一步加深我们对个性化治疗的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/90a180ab15c6/peerj-12-17991-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/8130a70810fd/peerj-12-17991-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/bcc221f6f795/peerj-12-17991-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/17213425dabe/peerj-12-17991-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/c1845c057c5d/peerj-12-17991-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/d0241690251d/peerj-12-17991-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/40863c5569b8/peerj-12-17991-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/90a180ab15c6/peerj-12-17991-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/8130a70810fd/peerj-12-17991-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/bcc221f6f795/peerj-12-17991-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/17213425dabe/peerj-12-17991-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/c1845c057c5d/peerj-12-17991-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/d0241690251d/peerj-12-17991-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/40863c5569b8/peerj-12-17991-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/90a180ab15c6/peerj-12-17991-g007.jpg

相似文献

1
CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations.CDMPred:一种用于预测具有高质量乘客突变的癌症驱动点突变的工具。
PeerJ. 2024 Sep 6;12:e17991. doi: 10.7717/peerj.17991. eCollection 2024.
2
Assessment of computational methods for predicting the effects of missense mutations in human cancers.评估计算方法预测人类癌症中错义突变影响的研究。
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2164-14-S3-S7. Epub 2013 May 28.
3
Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations.体细胞突变的癌症特异性高通量注释:驱动错义突变的计算预测
Cancer Res. 2009 Aug 15;69(16):6660-7. doi: 10.1158/0008-5472.CAN-09-1133. Epub 2009 Aug 4.
4
dbCPM: a manually curated database for exploring the cancer passenger mutations.dbCPM:一个用于探索癌症乘客突变的人工整理数据库。
Brief Bioinform. 2020 Jan 17;21(1):309-317. doi: 10.1093/bib/bby105.
5
CanDrA: cancer-specific driver missense mutation annotation with optimized features.CanDrA:具有优化特征的癌症特异性驱动子错义突变注释。
PLoS One. 2013 Oct 30;8(10):e77945. doi: 10.1371/journal.pone.0077945. eCollection 2013.
6
Predicting the functional consequences of somatic missense mutations found in tumors.预测肿瘤中发现的体细胞错义突变的功能后果。
Methods Mol Biol. 2014;1101:135-59. doi: 10.1007/978-1-62703-721-1_8.
7
Exploring preferred amino acid mutations in cancer genes: Applications to identify potential drug targets.探索癌症基因中的偏好氨基酸突变:用于识别潜在药物靶点的应用
Biochim Biophys Acta. 2016 Feb;1862(2):155-65. doi: 10.1016/j.bbadis.2015.11.006. Epub 2015 Nov 12.
8
Discrimination of driver and passenger mutations in epidermal growth factor receptor in cancer.癌症中表皮生长因子受体驱动突变和乘客突变的鉴别
Mutat Res. 2015 Oct;780:24-34. doi: 10.1016/j.mrfmmm.2015.07.005. Epub 2015 Jul 20.
9
Driver Missense Mutation Identification Using Feature Selection and Model Fusion.基于特征选择和模型融合的驱动错义突变识别
J Comput Biol. 2015 Dec;22(12):1075-85. doi: 10.1089/cmb.2015.0110. Epub 2015 Sep 24.
10
PredDSMC: A predictor for driver synonymous mutations in human cancers.PredDSMC:人类癌症中驱动同义突变的预测因子。
Front Genet. 2023 Mar 27;14:1164593. doi: 10.3389/fgene.2023.1164593. eCollection 2023.

本文引用的文献

1
Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.2022 年全球癌症统计数据:全球 185 个国家和地区 36 种癌症的发病率和死亡率全球估计数。
CA Cancer J Clin. 2024 May-Jun;74(3):229-263. doi: 10.3322/caac.21834. Epub 2024 Apr 4.
2
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks.预测癌症驱动基因和突变:集成计算框架的潜力。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbad519.
3
CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.
CADD v1.7:利用蛋白质语言模型、调控 CNN 以及其他核苷酸水平的评分来提高全基因组变异预测的准确性。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1143-D1154. doi: 10.1093/nar/gkad989.
4
Effect Predictor of Driver Synonymous Mutations Based on Multi-Feature Fusion and Iterative Feature Representation Learning.基于多特征融合和迭代特征表示学习的驱动同义突变效应预测。
IEEE J Biomed Health Inform. 2024 Feb;28(2):1144-1151. doi: 10.1109/JBHI.2023.3343075. Epub 2024 Feb 5.
5
Accurate proteome-wide missense variant effect prediction with AlphaMissense.使用 AlphaMissense 进行精确的全蛋白质错义变异效应预测。
Science. 2023 Sep 22;381(6664):eadg7492. doi: 10.1126/science.adg7492.
6
Repetitive DNA sequence detection and its role in the human genome.重复 DNA 序列检测及其在人类基因组中的作用。
Commun Biol. 2023 Sep 19;6(1):954. doi: 10.1038/s42003-023-05322-y.
7
DeepAlloDriver: a deep learning-based strategy to predict cancer driver mutations.DeepAlloDriver:一种基于深度学习的癌症驱动突变预测策略。
Nucleic Acids Res. 2023 Jul 5;51(W1):W129-W133. doi: 10.1093/nar/gkad295.
8
Cancer driver mutations: predictions and reality.癌症驱动突变:预测与现实。
Trends Mol Med. 2023 Jul;29(7):554-566. doi: 10.1016/j.molmed.2023.03.007. Epub 2023 Apr 17.
9
An omics-to-omics joint knowledge association subtensor model for radiogenomics cross-modal modules from genomics and ultrasonic images of breast cancers.一种用于乳腺癌基因组学和超声图像的放射基因组学跨模态模块的组学对组学联合知识关联子张量模型。
Comput Biol Med. 2023 Mar;155:106672. doi: 10.1016/j.compbiomed.2023.106672. Epub 2023 Feb 13.
10
Discovery of novel predisposing coding and noncoding variants in familial Hodgkin lymphoma.家族性霍奇金淋巴瘤中新的易感编码和非编码变异的发现。
Blood. 2023 Mar 16;141(11):1293-1307. doi: 10.1182/blood.2022016056.