CDMPred：一种用于预测具有高质量乘客突变的癌症驱动点突变的工具。

CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations.

机构信息

Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China.

School of Information Engineering, Huangshan University, Huangshan, Anhui, China.

出版信息

PeerJ. 2024 Sep 6;12:e17991. doi: 10.7717/peerj.17991. eCollection 2024.

DOI:10.7717/peerj.17991

PMID:39253604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11382650/

Abstract

Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred's prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.

摘要

大多数用于预测驱动突变的计算方法都是使用阳性样本进行训练的，而阴性样本通常来自统计方法或假定的样本。这些阴性样本在捕捉乘客突变多样性方面的代表性仍有待确定。为了解决这些问题，我们从 COSMIC 数据库中收集了一个包含驱动突变的平衡数据集，并从 Cancer Passenger Mutation 数据库中获得了高质量的乘客突变。随后，我们对这些突变的特征进行了编码。利用特征相关性分析，我们开发了一种名为 CDMPred 的癌症驱动突变错义预测器，该预测器采用集成学习技术 XGBoost 通过特征选择。在所提出的 CDMPred 方法中，利用前 10 个特征和 XGBoost，在训练集和独立测试集上的接收者操作特征曲线（AUC）值分别为 0.83 和 0.80。此外，CDMPred 在 AUC 和精度-召回曲线下面积方面的表现优于现有的癌症特异性和一般疾病的最先进方法。在训练数据中包含高质量的乘客突变对 CDMPred 的预测性能有利。我们预计 CDMPred 将成为预测癌症驱动突变的有价值的工具，进一步加深我们对个性化治疗的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1949/11382650/8130a70810fd/peerj-12-17991-g001.jpg

相似文献

CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations.

PeerJ. 2024 Sep 6;12:e17991. doi: 10.7717/peerj.17991. eCollection 2024.

Assessment of computational methods for predicting the effects of missense mutations in human cancers.

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2164-14-S3-S7. Epub 2013 May 28.

Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations.

Cancer Res. 2009 Aug 15;69(16):6660-7. doi: 10.1158/0008-5472.CAN-09-1133. Epub 2009 Aug 4.

dbCPM: a manually curated database for exploring the cancer passenger mutations.

Brief Bioinform. 2020 Jan 17;21(1):309-317. doi: 10.1093/bib/bby105.

CanDrA: cancer-specific driver missense mutation annotation with optimized features.

PLoS One. 2013 Oct 30;8(10):e77945. doi: 10.1371/journal.pone.0077945. eCollection 2013.

Predicting the functional consequences of somatic missense mutations found in tumors.

Methods Mol Biol. 2014;1101:135-59. doi: 10.1007/978-1-62703-721-1_8.

Exploring preferred amino acid mutations in cancer genes: Applications to identify potential drug targets.

Biochim Biophys Acta. 2016 Feb;1862(2):155-65. doi: 10.1016/j.bbadis.2015.11.006. Epub 2015 Nov 12.

Discrimination of driver and passenger mutations in epidermal growth factor receptor in cancer.

Mutat Res. 2015 Oct;780:24-34. doi: 10.1016/j.mrfmmm.2015.07.005. Epub 2015 Jul 20.

Driver Missense Mutation Identification Using Feature Selection and Model Fusion.

J Comput Biol. 2015 Dec;22(12):1075-85. doi: 10.1089/cmb.2015.0110. Epub 2015 Sep 24.

PredDSMC: A predictor for driver synonymous mutations in human cancers.

Front Genet. 2023 Mar 27;14:1164593. doi: 10.3389/fgene.2023.1164593. eCollection 2023.

本文引用的文献

Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.

CA Cancer J Clin. 2024 May-Jun;74(3):229-263. doi: 10.3322/caac.21834. Epub 2024 Apr 4.

Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbad519.

CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.

Nucleic Acids Res. 2024 Jan 5;52(D1):D1143-D1154. doi: 10.1093/nar/gkad989.

Effect Predictor of Driver Synonymous Mutations Based on Multi-Feature Fusion and Iterative Feature Representation Learning.

IEEE J Biomed Health Inform. 2024 Feb;28(2):1144-1151. doi: 10.1109/JBHI.2023.3343075. Epub 2024 Feb 5.

Accurate proteome-wide missense variant effect prediction with AlphaMissense.

Science. 2023 Sep 22;381(6664):eadg7492. doi: 10.1126/science.adg7492.

Repetitive DNA sequence detection and its role in the human genome.

Commun Biol. 2023 Sep 19;6(1):954. doi: 10.1038/s42003-023-05322-y.

DeepAlloDriver: a deep learning-based strategy to predict cancer driver mutations.

Nucleic Acids Res. 2023 Jul 5;51(W1):W129-W133. doi: 10.1093/nar/gkad295.

Cancer driver mutations: predictions and reality.

Trends Mol Med. 2023 Jul;29(7):554-566. doi: 10.1016/j.molmed.2023.03.007. Epub 2023 Apr 17.

An omics-to-omics joint knowledge association subtensor model for radiogenomics cross-modal modules from genomics and ultrasonic images of breast cancers.

Comput Biol Med. 2023 Mar;155:106672. doi: 10.1016/j.compbiomed.2023.106672. Epub 2023 Feb 13.

Discovery of novel predisposing coding and noncoding variants in familial Hodgkin lymphoma.

Blood. 2023 Mar 16;141(11):1293-1307. doi: 10.1182/blood.2022016056.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CDMPred：一种用于预测具有高质量乘客突变的癌症驱动点突变的工具。

CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献