Suppr
超能文献

随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.

作者信息

Agajanian Steve, Oluyemi Odeyemi, Verkhivker Gennady M

机构信息

Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States.

Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States.

出版信息

Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.

DOI:10.3389/fmolb.2019.00044

PMID:31245384

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6579812/

Abstract

Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.

摘要

开发用于预测癌症驱动基因和突变的功能及临床意义的机器学习解决方案，在现代生物医学研究中至关重要，并且在最近十年中获得了显著的发展势头。在这项工作中，我们整合了不同的机器学习方法，包括基于树的方法、随机森林和梯度提升树（GBT）分类器，以及深度卷积神经网络（CNN），用于预测基因组数据集中的癌症驱动突变。最初通过采用标签编码、独热编码和嵌入来预处理DNA信息，探索了CNN在使用原始核苷酸序列对癌症驱动突变进行分类方面的可行性。这些分类器与基于树的替代方法进行了基准测试，以便在相对尺度上评估性能。然后，我们将CNN生成的基于DNA的分数与各种保守、进化和功能特征类别整合到一个广义随机森林分类器中。这项研究的结果表明，CNN可以从基因组信息中学习到高级特征，这些特征与常用于癌症突变分类的基于集成的预测器互补。通过将深度学习生成的分数与仅两个主要的基于集成的功能特征相结合，我们可以实现各种机器学习分类器的卓越性能。我们的研究结果还表明，基于核苷酸的深度学习分数与源自蛋白质序列保守分数的综合指标的协同作用，可以在有限数量的高信息量特征的情况下，对癌症驱动突变进行稳健分类。机器学习预测被用于分子模拟、蛋白质稳定性以及蛋白质激酶基因中癌症突变的基于网络的分析，以获得关于驱动突变分子特征的见解，并增强癌症特异性分类模型的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/bc20bbef3a39/fmolb-06-00044-g0001.jpg

相似文献

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.

Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.

Machine Learning Classification and Structure-Functional Analysis of Cancer Mutations Reveal Unique Dynamic and Network Signatures of Driver Sites in Oncogenes and Tumor Suppressor Genes.

J Chem Inf Model. 2018 Oct 22;58(10):2131-2150. doi: 10.1021/acs.jcim.8b00414. Epub 2018 Oct 3.

Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.

Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.

Urban Tree Species Classification Using a WorldView-2/3 and LiDAR Data Fusion Approach and Deep Learning.

Sensors (Basel). 2019 Mar 14;19(6):1284. doi: 10.3390/s19061284.

deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks.

Front Genet. 2019 Jan 29;10:13. doi: 10.3389/fgene.2019.00013. eCollection 2019.

Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers.

J Voice. 2025 Jan;39(1):245-257. doi: 10.1016/j.jvoice.2022.07.007. Epub 2022 Sep 6.

MRI-Based Brain Tumor Classification Using Ensemble of Deep Features and Machine Learning Classifiers.

Sensors (Basel). 2021 Mar 22;21(6):2222. doi: 10.3390/s21062222.

A novel end-to-end classifier using domain transferred deep convolutional neural networks for biomedical images.

Comput Methods Programs Biomed. 2017 Mar;140:283-293. doi: 10.1016/j.cmpb.2016.12.019. Epub 2017 Jan 6.

Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets.

Comput Biol Med. 2018 Apr 1;95:217-233. doi: 10.1016/j.compbiomed.2018.02.008. Epub 2018 Feb 17.

Convolutional Neural Networks for ATC Classification.

Curr Pharm Des. 2018;24(34):4007-4012. doi: 10.2174/1381612824666181112113438.

引用本文的文献

Identifying potential risk genes for clear cell renal cell carcinoma with deep reinforcement learning.

Nat Commun. 2025 Apr 15;16(1):3591. doi: 10.1038/s41467-025-58439-5.

Revealing SARS-CoV-2 M mutation cold and hot spots: Dynamic residue network analysis meets machine learning.

Comput Struct Biotechnol J. 2024 Oct 22;23:3800-3816. doi: 10.1016/j.csbj.2024.10.031. eCollection 2024 Dec.

Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data.

BMC Bioinformatics. 2023 Feb 9;24(1):43. doi: 10.1186/s12859-023-05141-2.

Systemic structural analysis of alterations reveals a common structural basis of driver mutations in cancer.

NAR Cancer. 2023 Jan 18;5(1):zcac040. doi: 10.1093/narcan/zcac040. eCollection 2023 Mar.

Predicting the Prognostic Value of Expression in Different Cancers via a Machine Learning Approach.

Int J Mol Sci. 2022 Aug 2;23(15):8571. doi: 10.3390/ijms23158571.

Weighted Gene Coexpression Network Analysis Identifies TBC1D10C as a New Prognostic Biomarker for Breast Cancer.

Anal Cell Pathol (Amst). 2022 Apr 5;2022:5259187. doi: 10.1155/2022/5259187. eCollection 2022.

Dissecting mutational allosteric effects in alkaline phosphatases associated with different Hypophosphatasia phenotypes: An integrative computational investigation.

PLoS Comput Biol. 2022 Mar 23;18(3):e1010009. doi: 10.1371/journal.pcbi.1010009. eCollection 2022 Mar.

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes.

Cancers (Basel). 2021 May 14;13(10):2366. doi: 10.3390/cancers13102366.

Incorporating Machine Learning into Established Bioinformatics Frameworks.

Int J Mol Sci. 2021 Mar 12;22(6):2903. doi: 10.3390/ijms22062903.

A survey of multiscale modeling: Foundations, historical milestones, current status, and future prospects.

AIChE J. 2021 Mar;67(3):e17026. doi: 10.1002/aic.17026. Epub 2020 Sep 18.

本文引用的文献

deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks.

Front Genet. 2019 Jan 29;10:13. doi: 10.3389/fgene.2019.00013. eCollection 2019.

Competitive evolution of NSCLC tumor clones and the drug resistance mechanism of first-generation EGFR-TKIs in Chinese NSCLC patients.

Heliyon. 2018 Dec 19;4(12):e01031. doi: 10.1016/j.heliyon.2018.e01031. eCollection 2018 Dec.

A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data.

Nat Genet. 2018 Dec;50(12):1735-1743. doi: 10.1038/s41588-018-0257-y. Epub 2018 Nov 5.

Machine Learning Classification and Structure-Functional Analysis of Cancer Mutations Reveal Unique Dynamic and Network Signatures of Driver Sites in Oncogenes and Tumor Suppressor Genes.

J Chem Inf Model. 2018 Oct 22;58(10):2131-2150. doi: 10.1021/acs.jcim.8b00414. Epub 2018 Oct 3.

A universal SNP and small-indel variant caller using deep neural networks.

Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.

A machine learning approach for somatic mutation discovery.

Sci Transl Med. 2018 Sep 5;10(457). doi: 10.1126/scitranslmed.aar7939.

Strelka2: fast and accurate calling of germline and somatic variants.

Nat Methods. 2018 Aug;15(8):591-594. doi: 10.1038/s41592-018-0051-x. Epub 2018 Jul 16.

Comprehensive Characterization of Cancer Driver Genes and Mutations.

Cell. 2018 Apr 5;173(2):371-385.e18. doi: 10.1016/j.cell.2018.02.060.

Finding cancer driver mutations in the era of big data research.

Biophys Rev. 2019 Feb;11(1):21-29. doi: 10.1007/s12551-018-0415-6. Epub 2018 Apr 2.

Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines.

Cell Syst. 2018 Mar 28;6(3):271-281.e7. doi: 10.1016/j.cels.2018.03.002.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译