• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CPEM:基于随机森林和深度神经网络集成的体细胞改变的准确癌症类型分类。

CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network.

机构信息

School of Electrical and Computer Engineering, UNIST, Ulsan, Republic of Korea.

Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan, Republic of Korea.

出版信息

Sci Rep. 2019 Nov 15;9(1):16927. doi: 10.1038/s41598-019-53034-3.

DOI:10.1038/s41598-019-53034-3
PMID:31729414
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6858312/
Abstract

With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.

摘要

随着 DNA 测序技术的最新进展,大规模基因组数据的快速获取已变得司空见惯。特别是对于癌症研究,基于测序分析中检测到的体细胞改变对癌症类型进行分类的需求日益增加。然而,数据的规模和复杂性不断增加,使得分类任务极具挑战性。在这项研究中,我们评估了各种输入特征(如突变谱、突变率、突变特征和体细胞拷贝数改变等)的贡献,这些特征可以从基因组数据中得出,并进一步利用它们进行准确的癌症类型分类。我们引入了一种名为 CPEM(使用集成模型的癌症预测器)的新型机器学习分类器集成,该分类器在来自癌症基因组图谱(TCGA)数据库的 7002 个样本上进行了测试,这些样本代表了 31 种不同的癌症类型。我们首先系统地研究了输入特征的影响。在我们的初始预测模型中,与特定癌症相关的特征具有相对较高的重要性。我们进一步研究了各种机器学习分类器和特征选择方法,以得出基于集成的癌症类型预测模型,在嵌套的 10 倍交叉验证中达到了高达 84%的分类准确性。最后,我们将目标癌症缩小到六种最常见的类型,并达到了高达 94%的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/b2236c698860/41598_2019_53034_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/b7b713ef3b9f/41598_2019_53034_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/8c346f3f194d/41598_2019_53034_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/2d64836a8ad6/41598_2019_53034_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/b2236c698860/41598_2019_53034_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/b7b713ef3b9f/41598_2019_53034_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/8c346f3f194d/41598_2019_53034_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/2d64836a8ad6/41598_2019_53034_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831d/6858312/b2236c698860/41598_2019_53034_Fig4_HTML.jpg

相似文献

1
CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network.CPEM:基于随机森林和深度神经网络集成的体细胞改变的准确癌症类型分类。
Sci Rep. 2019 Nov 15;9(1):16927. doi: 10.1038/s41598-019-53034-3.
2
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
3
CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.CUP-AI-Dx:一种使用 RNA 基因表达数据和人工智能推断癌症组织来源和分子亚型的工具。
EBioMedicine. 2020 Nov;61:103030. doi: 10.1016/j.ebiom.2020.103030. Epub 2020 Oct 9.
4
A deep learning-based multi-model ensemble method for cancer prediction.基于深度学习的癌症预测多模型集成方法。
Comput Methods Programs Biomed. 2018 Jan;153:1-9. doi: 10.1016/j.cmpb.2017.09.005. Epub 2017 Sep 14.
5
Evaluating the Predictability of Cancer Types from 536 Somatic Mutations: A New Dataset.从536个体细胞突变评估癌症类型的可预测性:一个新数据集。
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5308-5311. doi: 10.1109/EMBC44109.2020.9176699.
6
Reviewing ensemble classification methods in breast cancer.综述乳腺癌中的集成分类方法。
Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20.
7
NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer.NeoMutate:一种用于癌症体细胞突变预测的集成机器学习框架。
BMC Med Genomics. 2019 May 16;12(1):63. doi: 10.1186/s12920-019-0508-5.
8
A pan-cancer somatic mutation embedding using autoencoders.基于自动编码器的泛癌种体细胞突变嵌入方法。
BMC Bioinformatics. 2019 Dec 11;20(1):655. doi: 10.1186/s12859-019-3298-z.
9
Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模
Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.
10
A feasibility study of returning clinically actionable somatic genomic alterations identified in a research laboratory.一项关于反馈在研究实验室中鉴定出的具有临床可操作性的体细胞基因组改变的可行性研究。
Oncotarget. 2017 Jun 27;8(26):41806-41814. doi: 10.18632/oncotarget.16018.

引用本文的文献

1
Classification performance assessment for imbalanced multiclass data.不平衡多类数据的分类性能评估。
Sci Rep. 2024 May 10;14(1):10759. doi: 10.1038/s41598-024-61365-z.
2
Deep-Learning Model for Tumor-Type Prediction Using Targeted Clinical Genomic Sequencing Data.基于靶向临床基因组测序数据的肿瘤类型预测深度学习模型。
Cancer Discov. 2024 Jun 3;14(6):1064-1081. doi: 10.1158/2159-8290.CD-23-0996.
3
Integrative analyses and validation of ferroptosis-related genes and mechanisms associated with cerebrovascular and cardiovascular ischemic diseases.

本文引用的文献

1
Detection and localization of surgically resectable cancers with a multi-analyte blood test.通过多分析物血液检测对外科可切除癌症进行检测和定位。
Science. 2018 Feb 23;359(6378):926-930. doi: 10.1126/science.aar3247. Epub 2018 Jan 18.
2
Mutant-IDH1-dependent chromatin state reprogramming, reversibility, and persistence.突变型 IDH1 依赖性染色质状态重编程、可逆性和持续性。
Nat Genet. 2018 Jan;50(1):62-72. doi: 10.1038/s41588-017-0001-z. Epub 2017 Nov 27.
3
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.
整合分析与验证与脑血管和心血管缺血性疾病相关的铁死亡相关基因和机制。
BMC Genomics. 2023 Dec 4;24(1):731. doi: 10.1186/s12864-023-09829-w.
4
Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations.使用 XGBoost 机器学习模型对肿瘤类型进行分类:基因组改变的向量空间变换。
J Transl Med. 2023 Nov 21;21(1):836. doi: 10.1186/s12967-023-04720-4.
5
Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping.突变注意力(MuAt):用于肿瘤分型和亚型分类的体细胞突变的深度表示学习。
Genome Med. 2023 Jul 7;15(1):47. doi: 10.1186/s13073-023-01204-4.
6
Classification of group A rotavirus VP7 and VP4 genotypes using random forest.使用随机森林对A组轮状病毒VP7和VP4基因型进行分类
Front Genet. 2023 May 30;14:1029185. doi: 10.3389/fgene.2023.1029185. eCollection 2023.
7
The Histone Methyltransferase SETD8 Regulates the Expression of Tumor Suppressor Genes via H4K20 Methylation and the p53 Signaling Pathway in Endometrial Cancer Cells.组蛋白甲基转移酶SETD8通过H4K20甲基化和p53信号通路调控子宫内膜癌细胞中肿瘤抑制基因的表达。
Cancers (Basel). 2022 Oct 31;14(21):5367. doi: 10.3390/cancers14215367.
8
GraphChrom: A Novel Graph-Based Framework for Cancer Classification Using Chromosomal Rearrangement Endpoints.GraphChrom:一种基于图的新型框架,用于利用染色体重排端点进行癌症分类。
Cancers (Basel). 2022 Jun 22;14(13):3060. doi: 10.3390/cancers14133060.
9
Secure tumor classification by shallow neural network using homomorphic encryption.利用同态加密实现浅层神经网络的肿瘤分类安全。
BMC Genomics. 2022 Apr 9;23(1):284. doi: 10.1186/s12864-022-08469-w.
10
Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization.基于贝叶斯超参数优化的癌症基因表达数据分类框架。
Med Biol Eng Comput. 2021 Nov;59(11-12):2353-2371. doi: 10.1007/s11517-021-02442-7. Epub 2021 Oct 5.
DeepGene:一种基于深度学习和体细胞点突变的先进癌症类型分类器。
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):476. doi: 10.1186/s12859-016-1334-9.
4
Ubiquitination and regulation of AURKA identifies a hypoxia-independent E3 ligase activity of VHL.AURKA的泛素化与调控确定了VHL的一种不依赖缺氧的E3连接酶活性。
Oncogene. 2017 Jun 15;36(24):3450-3463. doi: 10.1038/onc.2016.495. Epub 2017 Jan 23.
5
Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations.使用机器学习和体细胞突变对癌症原发部位进行分类
Biomed Res Int. 2015;2015:491502. doi: 10.1155/2015/491502. Epub 2015 Oct 11.
6
TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen.肿瘤追踪器:一种从肿瘤标本的体细胞突变中识别肿瘤起源组织的方法。
BMC Med Genomics. 2015 Oct 1;8:58. doi: 10.1186/s12920-015-0130-0.
7
A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.应用于微阵列数据的特征选择与特征提取方法综述
Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.
8
Machine learning applications in cancer prognosis and prediction.机器学习在癌症预后和预测中的应用。
Comput Struct Biotechnol J. 2014 Nov 15;13:8-17. doi: 10.1016/j.csbj.2014.11.005. eCollection 2015.
9
COSMIC: exploring the world's knowledge of somatic mutations in human cancer.COSMIC:探索全球关于人类癌症体细胞突变的知识。
Nucleic Acids Res. 2015 Jan;43(Database issue):D805-11. doi: 10.1093/nar/gku1075. Epub 2014 Oct 29.
10
Metabolism: reprogramming metabolic flux in glioma.代谢:胶质瘤中代谢通量的重编程
Nat Rev Cancer. 2014 Nov;14(11):706-7. doi: 10.1038/nrc3840. Epub 2014 Oct 6.