• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.

作者信息

Agajanian Steve, Oluyemi Odeyemi, Verkhivker Gennady M

机构信息

Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States.

Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States.

出版信息

Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.

DOI:10.3389/fmolb.2019.00044
PMID:31245384
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6579812/
Abstract

Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.

摘要

开发用于预测癌症驱动基因和突变的功能及临床意义的机器学习解决方案,在现代生物医学研究中至关重要,并且在最近十年中获得了显著的发展势头。在这项工作中,我们整合了不同的机器学习方法,包括基于树的方法、随机森林和梯度提升树(GBT)分类器,以及深度卷积神经网络(CNN),用于预测基因组数据集中的癌症驱动突变。最初通过采用标签编码、独热编码和嵌入来预处理DNA信息,探索了CNN在使用原始核苷酸序列对癌症驱动突变进行分类方面的可行性。这些分类器与基于树的替代方法进行了基准测试,以便在相对尺度上评估性能。然后,我们将CNN生成的基于DNA的分数与各种保守、进化和功能特征类别整合到一个广义随机森林分类器中。这项研究的结果表明,CNN可以从基因组信息中学习到高级特征,这些特征与常用于癌症突变分类的基于集成的预测器互补。通过将深度学习生成的分数与仅两个主要的基于集成的功能特征相结合,我们可以实现各种机器学习分类器的卓越性能。我们的研究结果还表明,基于核苷酸的深度学习分数与源自蛋白质序列保守分数的综合指标的协同作用,可以在有限数量的高信息量特征的情况下,对癌症驱动突变进行稳健分类。机器学习预测被用于分子模拟、蛋白质稳定性以及蛋白质激酶基因中癌症突变的基于网络的分析,以获得关于驱动突变分子特征的见解,并增强癌症特异性分类模型的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/f023930a7b97/fmolb-06-00044-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/bc20bbef3a39/fmolb-06-00044-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/c69e665eb22b/fmolb-06-00044-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/af3fc775cd07/fmolb-06-00044-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/c817a0e1f4f0/fmolb-06-00044-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/10cf90fc90be/fmolb-06-00044-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/871be1c098cf/fmolb-06-00044-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/458cb58cc2d4/fmolb-06-00044-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/5ed7038e1fe8/fmolb-06-00044-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/824128005b5d/fmolb-06-00044-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/f023930a7b97/fmolb-06-00044-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/bc20bbef3a39/fmolb-06-00044-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/c69e665eb22b/fmolb-06-00044-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/af3fc775cd07/fmolb-06-00044-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/c817a0e1f4f0/fmolb-06-00044-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/10cf90fc90be/fmolb-06-00044-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/871be1c098cf/fmolb-06-00044-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/458cb58cc2d4/fmolb-06-00044-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/5ed7038e1fe8/fmolb-06-00044-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/824128005b5d/fmolb-06-00044-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/f023930a7b97/fmolb-06-00044-g0010.jpg

相似文献

1
Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模
Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.
2
Machine Learning Classification and Structure-Functional Analysis of Cancer Mutations Reveal Unique Dynamic and Network Signatures of Driver Sites in Oncogenes and Tumor Suppressor Genes.机器学习分类和癌症突变的结构-功能分析揭示了癌基因和肿瘤抑制基因中驱动位点的独特动态和网络特征。
J Chem Inf Model. 2018 Oct 22;58(10):2131-2150. doi: 10.1021/acs.jcim.8b00414. Epub 2018 Oct 3.
3
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
4
Urban Tree Species Classification Using a WorldView-2/3 and LiDAR Data Fusion Approach and Deep Learning.利用 WorldView-2/3 和 LiDAR 数据融合方法及深度学习进行城市树种分类
Sensors (Basel). 2019 Mar 14;19(6):1284. doi: 10.3390/s19061284.
5
deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks.深度驱动者:基于体细胞突变利用深度卷积神经网络预测癌症驱动基因
Front Genet. 2019 Jan 29;10:13. doi: 10.3389/fgene.2019.00013. eCollection 2019.
6
Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers.基于卷积神经网络分类器集成学习的喉图像和嗓音用于早期声门癌诊断
J Voice. 2025 Jan;39(1):245-257. doi: 10.1016/j.jvoice.2022.07.007. Epub 2022 Sep 6.
7
MRI-Based Brain Tumor Classification Using Ensemble of Deep Features and Machine Learning Classifiers.基于 MRI 的脑肿瘤分类:深度学习特征集与机器学习分类器的应用
Sensors (Basel). 2021 Mar 22;21(6):2222. doi: 10.3390/s21062222.
8
A novel end-to-end classifier using domain transferred deep convolutional neural networks for biomedical images.一种使用域转移深度卷积神经网络的新型端到端生物医学图像分类器。
Comput Methods Programs Biomed. 2017 Mar;140:283-293. doi: 10.1016/j.cmpb.2016.12.019. Epub 2017 Jan 6.
9
Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets.结合深度残差神经网络特征与监督机器学习算法,对不同的食物图像数据集进行分类。
Comput Biol Med. 2018 Apr 1;95:217-233. doi: 10.1016/j.compbiomed.2018.02.008. Epub 2018 Feb 17.
10
Convolutional Neural Networks for ATC Classification.卷积神经网络在 ATC 分类中的应用。
Curr Pharm Des. 2018;24(34):4007-4012. doi: 10.2174/1381612824666181112113438.

引用本文的文献

1
Identifying potential risk genes for clear cell renal cell carcinoma with deep reinforcement learning.运用深度强化学习识别肾透明细胞癌的潜在风险基因。
Nat Commun. 2025 Apr 15;16(1):3591. doi: 10.1038/s41467-025-58439-5.
2
Revealing SARS-CoV-2 M mutation cold and hot spots: Dynamic residue network analysis meets machine learning.揭示新冠病毒M突变的冷热点:动态残基网络分析与机器学习相结合
Comput Struct Biotechnol J. 2024 Oct 22;23:3800-3816. doi: 10.1016/j.csbj.2024.10.031. eCollection 2024 Dec.
3
Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data.

本文引用的文献

1
deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks.深度驱动者:基于体细胞突变利用深度卷积神经网络预测癌症驱动基因
Front Genet. 2019 Jan 29;10:13. doi: 10.3389/fgene.2019.00013. eCollection 2019.
2
Competitive evolution of NSCLC tumor clones and the drug resistance mechanism of first-generation EGFR-TKIs in Chinese NSCLC patients.中国非小细胞肺癌患者中NSCLC肿瘤克隆的竞争性进化及第一代EGFR-TKIs的耐药机制
Heliyon. 2018 Dec 19;4(12):e01031. doi: 10.1016/j.heliyon.2018.e01031. eCollection 2018 Dec.
3
A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data.
半监督生成对抗网络预测带有未标记数据的致癌变异的模型性能和可解释性。
BMC Bioinformatics. 2023 Feb 9;24(1):43. doi: 10.1186/s12859-023-05141-2.
4
Systemic structural analysis of alterations reveals a common structural basis of driver mutations in cancer.对改变的系统结构分析揭示了癌症中驱动突变的共同结构基础。
NAR Cancer. 2023 Jan 18;5(1):zcac040. doi: 10.1093/narcan/zcac040. eCollection 2023 Mar.
5
Predicting the Prognostic Value of Expression in Different Cancers via a Machine Learning Approach.基于机器学习方法预测 表达在不同癌症中的预后价值。
Int J Mol Sci. 2022 Aug 2;23(15):8571. doi: 10.3390/ijms23158571.
6
Weighted Gene Coexpression Network Analysis Identifies TBC1D10C as a New Prognostic Biomarker for Breast Cancer.加权基因共表达网络分析鉴定 TBC1D10C 为乳腺癌的一个新的预后生物标志物。
Anal Cell Pathol (Amst). 2022 Apr 5;2022:5259187. doi: 10.1155/2022/5259187. eCollection 2022.
7
Dissecting mutational allosteric effects in alkaline phosphatases associated with different Hypophosphatasia phenotypes: An integrative computational investigation.解析与不同低磷酸酯酶症表型相关的碱性磷酸酶中的突变变构效应:综合计算研究。
PLoS Comput Biol. 2022 Mar 23;18(3):e1010009. doi: 10.1371/journal.pcbi.1010009. eCollection 2022 Mar.
8
Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes.序列邻域有助于可靠预测癌症基因组中的致病突变。
Cancers (Basel). 2021 May 14;13(10):2366. doi: 10.3390/cancers13102366.
9
Incorporating Machine Learning into Established Bioinformatics Frameworks.将机器学习纳入既定的生物信息学框架中。
Int J Mol Sci. 2021 Mar 12;22(6):2903. doi: 10.3390/ijms22062903.
10
A survey of multiscale modeling: Foundations, historical milestones, current status, and future prospects.多尺度建模综述:基础、历史里程碑、现状与未来展望
AIChE J. 2021 Mar;67(3):e17026. doi: 10.1002/aic.17026. Epub 2020 Sep 18.
深度学习方法自动优化癌症测序数据中体细胞变异的调用。
Nat Genet. 2018 Dec;50(12):1735-1743. doi: 10.1038/s41588-018-0257-y. Epub 2018 Nov 5.
4
Machine Learning Classification and Structure-Functional Analysis of Cancer Mutations Reveal Unique Dynamic and Network Signatures of Driver Sites in Oncogenes and Tumor Suppressor Genes.机器学习分类和癌症突变的结构-功能分析揭示了癌基因和肿瘤抑制基因中驱动位点的独特动态和网络特征。
J Chem Inf Model. 2018 Oct 22;58(10):2131-2150. doi: 10.1021/acs.jcim.8b00414. Epub 2018 Oct 3.
5
A universal SNP and small-indel variant caller using deep neural networks.使用深度神经网络的通用 SNP 和小插入缺失变体调用器。
Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.
6
A machine learning approach for somatic mutation discovery.机器学习在体细胞突变发现中的应用。
Sci Transl Med. 2018 Sep 5;10(457). doi: 10.1126/scitranslmed.aar7939.
7
Strelka2: fast and accurate calling of germline and somatic variants.Strelka2:快速准确地调用种系和体细胞变异。
Nat Methods. 2018 Aug;15(8):591-594. doi: 10.1038/s41592-018-0051-x. Epub 2018 Jul 16.
8
Comprehensive Characterization of Cancer Driver Genes and Mutations.全面描绘癌症驱动基因和突变。
Cell. 2018 Apr 5;173(2):371-385.e18. doi: 10.1016/j.cell.2018.02.060.
9
Finding cancer driver mutations in the era of big data research.在大数据研究时代寻找癌症驱动突变。
Biophys Rev. 2019 Feb;11(1):21-29. doi: 10.1007/s12551-018-0415-6. Epub 2018 Apr 2.
10
Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines.采用多种基因组分析流水线的肿瘤外显子组突变调用的可扩展开放科学方法。
Cell Syst. 2018 Mar 28;6(3):271-281.e7. doi: 10.1016/j.cels.2018.03.002.