基于基因表达的癌症类型预测卷积神经网络模型。

Convolutional neural network models for cancer type prediction based on gene expression.

机构信息

Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.

Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA.

出版信息

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):44. doi: 10.1186/s12920-020-0677-2.

DOI:10.1186/s12920-020-0677-2

PMID:32241303

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7119277/

Abstract

BACKGROUND

Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers.

RESULTS

In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on gene expression profiles from combined 10,340 samples of 33 cancer types and 713 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9-95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, 1D-CNN model, with a guided saliency technique and identified a total of 2090 cancer markers (108 per class on average). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for the prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction.

CONCLUSIONS

Here we present novel CNN designs for accurate and simultaneous cancer/normal and cancer types prediction based on gene expression profiles, and unique model interpretation scheme to elucidate biologically relevance of cancer marker genes after eliminating the effects of tissue-of-origin. The proposed model has light hyperparameters to be trained and thus can be easily adapted to facilitate cancer diagnosis in the future.

摘要

背景

精确预测癌症类型对于癌症诊断和治疗至关重要。通过预测模型，可以推断出重要的癌症标记基因。已经有几项研究试图为此任务构建机器学习模型，但没有考虑到组织起源的影响，而组织起源可能会影响癌症标志物的识别。

结果

在本文中，我们引入了几种卷积神经网络 (CNN) 模型，这些模型采用非结构化基因表达输入，将肿瘤和非肿瘤样本分类为指定的癌症类型或正常。基于基因嵌入和卷积方案的不同设计，我们实现了三种 CNN 模型：1D-CNN、2D-Vanilla-CNN 和 2D-Hybrid-CNN。这些模型在来自癌症基因组图谱 (TCGA) 的 33 种癌症和 713 个匹配正常组织的 10340 个样本的基因表达谱上进行了训练和测试。我们的模型在 34 个类别（33 种癌症和正常）中实现了优异的预测准确性（93.9-95.0%）。此外，我们使用一种引导式显著性技术对其中一个模型（1D-CNN 模型）进行了解释，共鉴定出 2090 个癌症标记物（平均每个类别 108 个）。这些标记物在它们所代表的癌症类型和其他癌症类型之间的差异表达的一致性得到了确认。例如，在乳腺癌中，我们的模型鉴定了 GATA3 和 ESR1 等知名标记物。最后，我们扩展了 1D-CNN 模型，用于预测乳腺癌亚型，在 5 个亚型中平均准确率为 88.42%。代码可在 https://github.com/chenlabgccri/CancerTypePrediction 上找到。

结论

在这里，我们提出了基于基因表达谱的新型 CNN 设计，用于准确和同时进行癌症/正常和癌症类型预测，以及独特的模型解释方案，用于在消除组织起源影响后阐明癌症标记基因的生物学相关性。所提出的模型具有轻量级的超参数，可以进行训练，因此可以很容易地适应未来的癌症诊断。

相似文献

Convolutional neural network models for cancer type prediction based on gene expression.基于基因表达的癌症类型预测卷积神经网络模型。

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):44. doi: 10.1186/s12920-020-0677-2.

Classification of Cancer Types Using Graph Convolutional Neural Networks.使用图卷积神经网络对癌症类型进行分类

Front Phys. 2020 Jun;8. doi: 10.3389/fphy.2020.00203. Epub 2020 Jun 17.

CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.CUP-AI-Dx：一种使用 RNA 基因表达数据和人工智能推断癌症组织来源和分子亚型的工具。

EBioMedicine. 2020 Nov;61:103030. doi: 10.1016/j.ebiom.2020.103030. Epub 2020 Oct 9.

Network-based drug sensitivity prediction.基于网络的药物敏感性预测。

BMC Med Genomics. 2020 Dec 28;13(Suppl 11):193. doi: 10.1186/s12920-020-00829-3.

A convolutional neural network model for survival prediction based on prognosis-related cascaded Wx feature selection.基于预后相关级联 Wx 特征选择的生存预测卷积神经网络模型。

Lab Invest. 2022 Oct;102(10):1064-1074. doi: 10.1038/s41374-022-00801-y. Epub 2022 Jul 9.

Convolutional neural network for human cancer types prediction by integrating protein interaction networks and omics data.基于蛋白质相互作用网络和组学数据融合的卷积神经网络进行人类癌症类型预测。

Sci Rep. 2021 Oct 19;11(1):20691. doi: 10.1038/s41598-021-98814-y.

Deep Convolutional Neural Networks Enable Discrimination of Heterogeneous Digital Pathology Images.深度卷积神经网络能够区分异质数字病理学图像。

EBioMedicine. 2018 Jan;27:317-328. doi: 10.1016/j.ebiom.2017.12.026. Epub 2017 Dec 28.

A deep dive into understanding tumor foci classification using multiparametric MRI based on convolutional neural network.基于卷积神经网络，深入探究利用多参数磁共振成像进行肿瘤病灶分类。

Med Phys. 2020 Sep;47(9):4077-4086. doi: 10.1002/mp.14255. Epub 2020 Jun 12.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction.CNN-MGP：用于宏基因组基因预测的卷积神经网络。

Interdiscip Sci. 2019 Dec;11(4):628-635. doi: 10.1007/s12539-018-0313-4. Epub 2018 Dec 27.

引用本文的文献

HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes.标志性图：一种基于癌症特征的图神经网络，用于对肿瘤亚型进行分层分类。

Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf444.

Application of deep learning models in gastric cancer pathology image analysis: a systematic scoping review.深度学习模型在胃癌病理图像分析中的应用：一项系统的范围综述。

BMC Cancer. 2025 Aug 1;25(1):1257. doi: 10.1186/s12885-025-14662-3.

Interpretable graph Kolmogorov-Arnold networks for multi-cancer classification and biomarker identification using multi-omics data.用于多癌分类和使用多组学数据进行生物标志物识别的可解释图柯尔莫哥洛夫-阿诺德网络

Sci Rep. 2025 Jul 29;15(1):27607. doi: 10.1038/s41598-025-13337-0.

Semi-supervised data-integrated feature importance enhances performance and interpretability of biological classification tasks.半监督数据集成特征重要性提升了生物分类任务的性能和可解释性。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i373-i381. doi: 10.1093/bioinformatics/btaf190.

Explainable AI Model Reveals Informative Mutational Signatures for Cancer-Type Classification.可解释人工智能模型揭示用于癌症类型分类的信息性突变特征。

Cancers (Basel). 2025 May 22;17(11):1731. doi: 10.3390/cancers17111731.

Comparative Analysis of Multi-Omics Integration Using Graph Neural Networks for Cancer Classification.使用图神经网络进行癌症分类的多组学整合的比较分析

IEEE Access. 2025;13:37724-37736. doi: 10.1109/access.2025.3540769. Epub 2025 Feb 11.

Deep Learning-Assisted Diagnostic System: Apices and Odontogenic Sinus Floor Level Analysis in Dental Panoramic Radiographs.深度学习辅助诊断系统：牙科全景X线片中根尖和牙源性窦底水平分析

Bioengineering (Basel). 2025 Jan 30;12(2):134. doi: 10.3390/bioengineering12020134.

Spatially distinct cellular and molecular landscapes define prognosis in triple negative breast cancer.空间上不同的细胞和分子格局决定三阴性乳腺癌的预后。

bioRxiv. 2025 Feb 12:2025.02.10.637503. doi: 10.1101/2025.02.10.637503.

Cellular Senescence in Hepatocellular Carcinoma: Immune Microenvironment Insights via Machine Learning and In Vitro Experiments.肝细胞癌中的细胞衰老：通过机器学习和体外实验洞察免疫微环境

Int J Mol Sci. 2025 Jan 17;26(2):773. doi: 10.3390/ijms26020773.

The development of an efficient artificial intelligence-based classification approach for colorectal cancer response to radiochemotherapy: deep learning vs. machine learning.一种用于结直肠癌对放化疗反应的高效基于人工智能的分类方法的开发：深度学习与机器学习

Sci Rep. 2025 Jan 2;15(1):62. doi: 10.1038/s41598-024-84023-w.

本文引用的文献

Deep learning of pharmacogenomics resources: moving towards precision oncology.基于药理学基因组学资源的深度学习：迈向精准肿瘤学。

Brief Bioinform. 2020 Dec 1;21(6):2066-2083. doi: 10.1093/bib/bbz144.

deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks.深度驱动者：基于体细胞突变利用深度卷积神经网络预测癌症驱动基因

Front Genet. 2019 Jan 29;10:13. doi: 10.3389/fgene.2019.00013. eCollection 2019.

Predicting drug response of tumors from integrated genomic profiles by deep neural networks.基于深度神经网络的整合基因组图谱预测肿瘤药物反应

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):18. doi: 10.1186/s12920-018-0460-9.

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization.GSAE：一种带有嵌入式基因集节点的自动编码器，用于基因组功能表征。

BMC Syst Biol. 2018 Dec 21;12(Suppl 8):142. doi: 10.1186/s12918-018-0642-2.

Cancer type prediction based on copy number aberration and chromatin 3D structure with convolutional neural networks.基于拷贝数变异和染色质 3D 结构的癌症类型预测的卷积神经网络方法。

BMC Genomics. 2018 Aug 13;19(Suppl 6):565. doi: 10.1186/s12864-018-4919-z.

Classification, Ontology, and Precision Medicine.分类、本体论与精准医学。

N Engl J Med. 2018 Oct 11;379(15):1452-1462. doi: 10.1056/NEJMra1615014.

GeneCT: a generalizable cancerous status and tissue origin classifier for pan-cancer biopsies.GeneCT：一种适用于泛癌活检的癌症状态和组织起源通用分类器。

Bioinformatics. 2018 Dec 1;34(23):4129-4130. doi: 10.1093/bioinformatics/bty524.

Detection and localization of surgically resectable cancers with a multi-analyte blood test.通过多分析物血液检测对外科可切除癌症进行检测和定位。

Science. 2018 Feb 23;359(6378):926-930. doi: 10.1126/science.aar3247. Epub 2018 Jan 18.

Cancer statistics, 2018.癌症统计数据，2018 年。

CA Cancer J Clin. 2018 Jan;68(1):7-30. doi: 10.3322/caac.21442. Epub 2018 Jan 4.

Genetic effects on gene expression across human tissues.基因对人体各组织基因表达的影响。

Nature. 2017 Oct 11;550(7675):204-213. doi: 10.1038/nature24277.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于基因表达的癌症类型预测卷积神经网络模型。

Convolutional neural network models for cancer type prediction based on gene expression.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献