• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DeepGene:一种基于深度学习和体细胞点突变的先进癌症类型分类器。

DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

作者信息

Yuan Yuchen, Shi Yi, Li Changyang, Kim Jinman, Cai Weidong, Han Zeguang, Feng David Dagan

机构信息

School of Information Technologies, The University of Sydney, Darlington, NSW, 2008, Australia.

Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiaotong University, Shanghai, 200240, China.

出版信息

BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):476. doi: 10.1186/s12859-016-1334-9.

DOI:10.1186/s12859-016-1334-9
PMID:28155641
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5259816/
Abstract

BACKGROUND

With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance.

RESULTS

To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy.

CONCLUSIONS

Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

摘要

背景

随着DNA测序技术的发展,近年来大量测序数据得以获取,为体细胞点突变与癌症类型/亚型之间的高级关联研究提供了前所未有的机会,这可能有助于实现基于体细胞点突变的更准确癌症分类(SMCC)。然而,在现有的SMCC方法中,诸如高数据稀疏性、小样本量以及简单线性分类器的应用等问题,是提高分类性能的主要障碍。

结果

为解决现有SMCC研究中的障碍,我们提出了DeepGene,一种基于深度神经网络(DNN)的先进分类器,它由三个步骤组成:首先,聚类基因过滤(CGF)通过突变发生频率集中基因数据,滤除大多数无关基因;其次,索引稀疏性降低(ISR)将基因数据转换为其非零元素的索引,从而显著抑制数据稀疏性的影响;最后,将经过CGF和ISR处理的数据输入到DNN分类器中,该分类器提取高级特征以进行准确分类。在我们精心整理的TCGA - DeepGene数据集上的实验结果表明,CGF、ISR和DNN都有助于提高整体分类性能。该数据集是TCGA数据集的重新整理子集,包含12种选定的癌症类型。我们进一步将DeepGene与三种广泛采用的分类器进行比较,并证明DeepGene在测试准确率方面至少有24%的性能提升。

结论

基于深度学习和体细胞点突变数据,我们设计了DeepGene,一种先进的癌症类型分类器,它解决了现有SMCC研究中的障碍。实验表明,DeepGene优于三种广泛采用的现有分类器,这主要归功于其深度学习模块能够提取组合体细胞点突变与癌症类型之间的高级特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/6df4196dd878/12859_2016_1334_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/f5eaf5be9af9/12859_2016_1334_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/ce061ee8935a/12859_2016_1334_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/9f41b841f88c/12859_2016_1334_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/740c6ef32587/12859_2016_1334_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/a49770d4560b/12859_2016_1334_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/a01a11fead5b/12859_2016_1334_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/53adeeb1300a/12859_2016_1334_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/6df4196dd878/12859_2016_1334_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/f5eaf5be9af9/12859_2016_1334_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/ce061ee8935a/12859_2016_1334_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/9f41b841f88c/12859_2016_1334_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/740c6ef32587/12859_2016_1334_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/a49770d4560b/12859_2016_1334_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/a01a11fead5b/12859_2016_1334_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/53adeeb1300a/12859_2016_1334_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8f/5259816/6df4196dd878/12859_2016_1334_Fig8_HTML.jpg

相似文献

1
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.DeepGene:一种基于深度学习和体细胞点突变的先进癌症类型分类器。
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):476. doi: 10.1186/s12859-016-1334-9.
2
Cancer type prediction based on copy number aberration and chromatin 3D structure with convolutional neural networks.基于拷贝数变异和染色质 3D 结构的癌症类型预测的卷积神经网络方法。
BMC Genomics. 2018 Aug 13;19(Suppl 6):565. doi: 10.1186/s12864-018-4919-z.
3
Deep convolutional neural networks for accurate somatic mutation detection.深度卷积神经网络用于准确的体细胞突变检测。
Nat Commun. 2019 Mar 4;10(1):1041. doi: 10.1038/s41467-019-09027-x.
4
Accurate cancer classification using expressions of very few genes.利用极少基因的表达进行精确的癌症分类。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):40-53. doi: 10.1109/TCBB.2007.1006.
5
A novel end-to-end classifier using domain transferred deep convolutional neural networks for biomedical images.一种使用域转移深度卷积神经网络的新型端到端生物医学图像分类器。
Comput Methods Programs Biomed. 2017 Mar;140:283-293. doi: 10.1016/j.cmpb.2016.12.019. Epub 2017 Jan 6.
6
Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction.基于邻域粗糙集的基因约简和概率神经网络集成的肿瘤分类方法
Comput Biol Med. 2010 Feb;40(2):179-89. doi: 10.1016/j.compbiomed.2009.11.014. Epub 2009 Dec 30.
7
DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost.DNN-Boost:使用深度神经网络和 XGBoost 对仅肿瘤全外显子测序数据进行体细胞突变识别。
J Bioinform Comput Biol. 2021 Dec;19(6):2140017. doi: 10.1142/S0219720021400175. Epub 2021 Dec 13.
8
BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.BCDForest:基于基因表达数据的癌症亚型分类的提升级联深森林模型。
BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):118. doi: 10.1186/s12859-018-2095-4.
9
CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network.CPEM:基于随机森林和深度神经网络集成的体细胞改变的准确癌症类型分类。
Sci Rep. 2019 Nov 15;9(1):16927. doi: 10.1038/s41598-019-53034-3.
10
A deep learning-based multi-model ensemble method for cancer prediction.基于深度学习的癌症预测多模型集成方法。
Comput Methods Programs Biomed. 2018 Jan;153:1-9. doi: 10.1016/j.cmpb.2017.09.005. Epub 2017 Sep 14.

引用本文的文献

1
Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions.将人工智能整合到下一代测序中:进展、挑战与未来方向。
Curr Issues Mol Biol. 2025 Jun 19;47(6):470. doi: 10.3390/cimb47060470.
2
Explainable AI Model Reveals Informative Mutational Signatures for Cancer-Type Classification.可解释人工智能模型揭示用于癌症类型分类的信息性突变特征。
Cancers (Basel). 2025 May 22;17(11):1731. doi: 10.3390/cancers17111731.
3
AI predicting recurrence in non-muscle-invasive bladder cancer: systematic review with study strengths and weaknesses.

本文引用的文献

1
Fully Convolutional Networks for Semantic Segmentation.全卷积网络用于语义分割。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):640-651. doi: 10.1109/TPAMI.2016.2572683. Epub 2016 May 24.
2
The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge.癌症基因组图谱(TCGA):一个不可估量的知识来源。
Contemp Oncol (Pozn). 2015;19(1A):A68-77. doi: 10.5114/wo.2014.47136.
3
COSMIC: exploring the world's knowledge of somatic mutations in human cancer.COSMIC:探索全球关于人类癌症体细胞突变的知识。
人工智能预测非肌层浸润性膀胱癌复发:具有研究优缺点的系统评价
Front Oncol. 2025 Jan 7;14:1509362. doi: 10.3389/fonc.2024.1509362. eCollection 2024.
4
3D genome contributes to MHC-II neoantigen prediction.三维基因组影响 MHC-II 新抗原预测。
BMC Genomics. 2024 Sep 26;25(Suppl 2):889. doi: 10.1186/s12864-024-10687-3.
5
Prediction of Alzheimer's Disease Based on 3D Genome Selected circRNA.基于 3D 基因组选择的 circRNA 预测阿尔茨海默病。
J Prev Alzheimers Dis. 2024;11(4):1055-1062. doi: 10.14283/jpad.2024.52.
6
Application of Photoactive Compounds in Cancer Theranostics: Review on Recent Trends from Photoactive Chemistry to Artificial Intelligence.光活性化合物在癌症诊治中的应用:光活性化学到人工智能的最新趋势综述。
Molecules. 2024 Jul 3;29(13):3164. doi: 10.3390/molecules29133164.
7
Deep learning in bioinformatics.生物信息学中的深度学习。
Turk J Biol. 2023 Dec 18;47(6):366-382. doi: 10.55730/1300-0152.2671. eCollection 2023.
8
Deep learning in cancer genomics and histopathology.深度学习在癌症基因组学和组织病理学中的应用。
Genome Med. 2024 Mar 27;16(1):44. doi: 10.1186/s13073-024-01315-6.
9
A systematic analysis of deep learning in genomics and histopathology for precision oncology.针对精准肿瘤学,对基因组学和组织病理学中深度学习的系统分析。
BMC Med Genomics. 2024 Feb 5;17(1):48. doi: 10.1186/s12920-024-01796-9.
10
Machine learning based biomarker discovery for chronic kidney disease-mineral and bone disorder (CKD-MBD).基于机器学习的慢性肾脏病-矿物质和骨异常(CKD-MBD)生物标志物发现
BMC Med Inform Decis Mak. 2024 Feb 5;24(1):36. doi: 10.1186/s12911-024-02421-6.
Nucleic Acids Res. 2015 Jan;43(Database issue):D805-11. doi: 10.1093/nar/gku1075. Epub 2014 Oct 29.
4
Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers.在癌症基因组测序数据中检测体细胞点突变:突变调用程序的比较。
Genome Med. 2013 Oct 11;5(10):91. doi: 10.1186/gm495. eCollection 2013.
5
Emerging patterns of somatic mutations in cancer.癌症中体细胞突变的新兴模式。
Nat Rev Genet. 2013 Oct;14(10):703-18. doi: 10.1038/nrg3539. Epub 2013 Sep 11.
6
Potential risks of pharmacy compounding.药剂配制的潜在风险。
Drugs R D. 2013 Mar;13(1):1-8. doi: 10.1007/s40268-013-0005-9.
7
Circulating tumor cells, disease recurrence and survival in newly diagnosed breast cancer.新诊断乳腺癌中的循环肿瘤细胞、疾病复发与生存情况
Breast Cancer Res. 2012 Oct 22;14(5):R133. doi: 10.1186/bcr3333.
8
GENCODE: the reference human genome annotation for The ENCODE Project.GENCODE:ENCODE 项目的人类参考基因组注释。
Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.
9
Tumor heterogeneity and personalized medicine.肿瘤异质性与个性化医疗。
N Engl J Med. 2012 Mar 8;366(10):956-7. doi: 10.1056/NEJMe1200656.
10
Model-based learning using a mixture of mixtures of Gaussian and uniform distributions.基于模型的学习,使用混合高斯和均匀分布的混合物。
IEEE Trans Pattern Anal Mach Intell. 2012 Apr;34(4):814-7. doi: 10.1109/TPAMI.2011.199.