Suppr超能文献

随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.

作者信息

Agajanian Steve, Oluyemi Odeyemi, Verkhivker Gennady M

机构信息

Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States.

Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States.

出版信息

Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.

Abstract

Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.

摘要

开发用于预测癌症驱动基因和突变的功能及临床意义的机器学习解决方案,在现代生物医学研究中至关重要,并且在最近十年中获得了显著的发展势头。在这项工作中,我们整合了不同的机器学习方法,包括基于树的方法、随机森林和梯度提升树(GBT)分类器,以及深度卷积神经网络(CNN),用于预测基因组数据集中的癌症驱动突变。最初通过采用标签编码、独热编码和嵌入来预处理DNA信息,探索了CNN在使用原始核苷酸序列对癌症驱动突变进行分类方面的可行性。这些分类器与基于树的替代方法进行了基准测试,以便在相对尺度上评估性能。然后,我们将CNN生成的基于DNA的分数与各种保守、进化和功能特征类别整合到一个广义随机森林分类器中。这项研究的结果表明,CNN可以从基因组信息中学习到高级特征,这些特征与常用于癌症突变分类的基于集成的预测器互补。通过将深度学习生成的分数与仅两个主要的基于集成的功能特征相结合,我们可以实现各种机器学习分类器的卓越性能。我们的研究结果还表明,基于核苷酸的深度学习分数与源自蛋白质序列保守分数的综合指标的协同作用,可以在有限数量的高信息量特征的情况下,对癌症驱动突变进行稳健分类。机器学习预测被用于分子模拟、蛋白质稳定性以及蛋白质激酶基因中癌症突变的基于网络的分析,以获得关于驱动突变分子特征的见解,并增强癌症特异性分类模型的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c641/6579812/bc20bbef3a39/fmolb-06-00044-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验