使用图卷积神经网络对癌症类型进行分类

Classification of Cancer Types Using Graph Convolutional Neural Networks.

作者信息

Ramirez Ricardo, Chiu Yu-Chiao, Hererra Allen, Mostavi Milad, Ramirez Joshua, Chen Yidong, Huang Yufei, Jin Yu-Fang

机构信息

Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, Texas 78249, USA.

Greehey Children's Cancer Research Institute, The University of Texas Health San Antonio, San Antonio, TX, 78229, USA.

出版信息

Front Phys. 2020 Jun;8. doi: 10.3389/fphy.2020.00203. Epub 2020 Jun 17.

DOI:10.3389/fphy.2020.00203

PMID:33437754

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7799442/

Abstract

BACKGROUND

Cancer has been a leading cause of death in the United States with significant health care costs. Accurate prediction of cancers at an early stage and understanding the genomic mechanisms that drive cancer development are vital to the improvement of treatment outcomes and survival rates, thus resulting in significant social and economic impacts. Attempts have been made to classify cancer types with machine learning techniques during the past two decades and deep learning approaches more recently.

RESULTS

In this paper, we established four models with graph convolutional neural network (GCNN) that use unstructured gene expressions as inputs to classify different tumor and non-tumor samples into their designated 33 cancer types or as normal. Four GCNN models based on a co-expression graph, co-expression+singleton graph, protein-protein interaction (PPI) graph, and PPI+singleton graph have been designed and implemented. They were trained and tested on combined 10,340 cancer samples and 731 normal tissue samples from The Cancer Genome Atlas (TCGA) dataset. The established GCNN models achieved excellent prediction accuracies (89.9-94.7%) among 34 classes (33 cancer types and a normal group). gene-perturbation experiments were performed on four models based on co-expression graph, co-expression+singleton, PPI graph, and PPI+singleton graphs. The co-expression GCNN model was further interpreted to identify a total of 428 markers genes that drive the classification of 33 cancer types and normal. The concordance of differential expressions of these markers between the represented cancer type and others are confirmed. Successful classification of cancer types and a normal group regardless of normal tissues' origin suggested that the identified markers are cancer-specific rather than tissue-specific.

CONCLUSION

Novel GCNN models have been established to predict cancer types or normal tissue based on gene expression profiles. We demonstrated the results from the TCGA dataset that these models can produce accurate classification (above 94%), using cancer-specific markers genes. The models and the source codes are publicly available and can be readily adapted to the diagnosis of cancer and other diseases by the data-driven modeling research community.

摘要

背景

癌症一直是美国主要的死因之一，医疗成本高昂。早期准确预测癌症并了解驱动癌症发展的基因组机制对于改善治疗效果和生存率至关重要，从而产生重大的社会和经济影响。在过去二十年中，人们尝试使用机器学习技术对癌症类型进行分类，最近又采用了深度学习方法。

结果

在本文中，我们建立了四个基于图卷积神经网络（GCNN）的模型，这些模型使用非结构化基因表达作为输入，将不同的肿瘤和非肿瘤样本分类为指定的33种癌症类型或正常样本。设计并实现了基于共表达图、共表达+单例图、蛋白质-蛋白质相互作用（PPI）图和PPI+单例图的四个GCNN模型。它们在来自癌症基因组图谱（TCGA）数据集的10340个癌症样本和731个正常组织样本的组合上进行了训练和测试。所建立的GCNN模型在34个类别（33种癌症类型和一个正常组）中实现了优异的预测准确率（89.9 - 94.7%）。对基于共表达图、共表达+单例图、PPI图和PPI+单例图的四个模型进行了基因扰动实验。对共表达GCNN模型进行了进一步解释，以识别总共428个驱动33种癌症类型和正常样本分类的标记基因。证实了这些标记在代表性癌症类型与其他类型之间差异表达的一致性。无论正常组织的来源如何，都成功地对癌症类型和正常组进行了分类，这表明所识别的标记是癌症特异性的而非组织特异性的。

结论

已经建立了新颖的GCNN模型，用于基于基因表达谱预测癌症类型或正常组织。我们展示了来自TCGA数据集的结果，即这些模型可以使用癌症特异性标记基因产生准确的分类（超过94%）。这些模型和源代码是公开可用的，数据驱动的建模研究社区可以很容易地将其应用于癌症和其他疾病的诊断。

相似文献

Classification of Cancer Types Using Graph Convolutional Neural Networks.使用图卷积神经网络对癌症类型进行分类

Front Phys. 2020 Jun;8. doi: 10.3389/fphy.2020.00203. Epub 2020 Jun 17.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Prediction and interpretation of cancer survival using graph convolution neural networks.基于图卷积神经网络的癌症生存预测和解释。

Methods. 2021 Aug;192:120-130. doi: 10.1016/j.ymeth.2021.01.004. Epub 2021 Jan 21.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA？一项初步评估。

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Short-Term Memory Impairment短期记忆障碍

Sexual Harassment and Prevention Training性骚扰与预防培训

[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果：来自系统评价和意大利医院数据评估的证据]

Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

引用本文的文献

Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review.用于癌症诊断和预后的知识驱动型机器学习综述

IEEE Trans Autom Sci Eng. 2025;22:10008-10028. doi: 10.1109/tase.2024.3515839. Epub 2024 Dec 18.

Deep learning-driven multi-omics analysis: enhancing cancer diagnostics and therapeutics.深度学习驱动的多组学分析：增强癌症诊断与治疗

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf440.

HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes.标志性图：一种基于癌症特征的图神经网络，用于对肿瘤亚型进行分层分类。

Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf444.

Interpretable graph Kolmogorov-Arnold networks for multi-cancer classification and biomarker identification using multi-omics data.用于多癌分类和使用多组学数据进行生物标志物识别的可解释图柯尔莫哥洛夫-阿诺德网络

Sci Rep. 2025 Jul 29;15(1):27607. doi: 10.1038/s41598-025-13337-0.

Artificial Intelligence in cancer epigenomics: a review on advances in pan-cancer detection and precision medicine.癌症表观基因组学中的人工智能：泛癌检测与精准医学进展综述

Epigenetics Chromatin. 2025 Jun 14;18(1):35. doi: 10.1186/s13072-025-00595-5.

GNNMutation: a heterogeneous graph-based framework for cancer detection.GNNMutation：一种基于异构图的癌症检测框架。

BMC Bioinformatics. 2025 Jun 4;26(1):153. doi: 10.1186/s12859-025-06133-0.

Strategies to include prior knowledge in omics analysis with deep neural networks.在组学分析中利用深度神经网络纳入先验知识的策略。

Patterns (N Y). 2025 Mar 14;6(3):101203. doi: 10.1016/j.patter.2025.101203.

Comparative Analysis of Multi-Omics Integration Using Graph Neural Networks for Cancer Classification.使用图神经网络进行癌症分类的多组学整合的比较分析

IEEE Access. 2025;13:37724-37736. doi: 10.1109/access.2025.3540769. Epub 2025 Feb 11.

A comparative analysis of gene expression profiling by statistical and machine learning approaches.通过统计和机器学习方法对基因表达谱进行的比较分析。

Bioinform Adv. 2024 Dec 18;5(1):vbae199. doi: 10.1093/bioadv/vbae199. eCollection 2025.

Epigenetic ageing clocks: statistical methods and emerging computational challenges.表观遗传衰老时钟：统计方法与新出现的计算挑战

Nat Rev Genet. 2025 May;26(5):350-368. doi: 10.1038/s41576-024-00807-w. Epub 2025 Jan 13.

本文引用的文献

Convolutional neural network models for cancer type prediction based on gene expression.基于基因表达的癌症类型预测卷积神经网络模型。

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):44. doi: 10.1186/s12920-020-0677-2.

Causability and explainability of artificial intelligence in medicine.人工智能在医学中的可归因性与可解释性。

Wiley Interdiscip Rev Data Min Knowl Discov. 2019 Jul-Aug;9(4):e1312. doi: 10.1002/widm.1312. Epub 2019 Apr 2.

Utilizing Molecular Network Information via Graph Convolutional Neural Networks to Predict Metastatic Event in Breast Cancer.通过图卷积神经网络利用分子网络信息预测乳腺癌转移事件

Stud Health Technol Inform. 2019 Sep 3;267:181-186. doi: 10.3233/SHTI190824.

Predicting drug response of tumors from integrated genomic profiles by deep neural networks.基于深度神经网络的整合基因组图谱预测肿瘤药物反应

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):18. doi: 10.1186/s12920-018-0460-9.

Cancer statistics, 2019.癌症统计数据，2019 年。

CA Cancer J Clin. 2019 Jan;69(1):7-34. doi: 10.3322/caac.21551. Epub 2019 Jan 8.

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization.GSAE：一种带有嵌入式基因集节点的自动编码器，用于基因组功能表征。

BMC Syst Biol. 2018 Dec 21;12(Suppl 8):142. doi: 10.1186/s12918-018-0642-2.

Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.起源细胞模式主导了 33 种癌症类型的 10000 个肿瘤的分子分类。

Cell. 2018 Apr 5;173(2):291-304.e6. doi: 10.1016/j.cell.2018.03.022.

A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data.利用癌症基因组图谱基因表达数据进行的全面基因组泛癌分类。

BMC Genomics. 2017 Jul 3;18(1):508. doi: 10.1186/s12864-017-3906-0.

Differential correlation analysis of glioblastoma reveals immune ceRNA interactions predictive of patient survival.胶质母细胞瘤的差异相关性分析揭示了可预测患者生存的免疫ceRNA相互作用。

BMC Bioinformatics. 2017 Feb 28;18(1):132. doi: 10.1186/s12859-017-1557-4.

Differential correlation for sequencing data.测序数据的差异相关性

BMC Res Notes. 2017 Jan 19;10(1):54. doi: 10.1186/s13104-016-2331-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。