基于通路的深度聚类在癌症分子分型中的应用。

Pathway-based deep clustering for molecular subtyping of cancer.

机构信息

Analytics and Data Science, Kennesaw State University, Kennesaw, USA.

Department of Computer Science, Kennesaw State University, Marietta, USA.

出版信息

Methods. 2020 Feb 15;173:24-31. doi: 10.1016/j.ymeth.2019.06.017. Epub 2019 Jun 25.

DOI:10.1016/j.ymeth.2019.06.017

PMID:31247294

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7378959/

Abstract

Cancer is a genetic disease comprising multiple subtypes that have distinct molecular characteristics and clinical features. Cancer subtyping helps in improving personalized treatment and making decision, as different cancer subtypes respond differently to the treatment. The increasing availability of cancer related genomic data provides the opportunity to identify molecular subtypes. Several unsupervised machine learning techniques have been applied on molecular data of the tumor samples to identify cancer subtypes that are genetically and clinically distinct. However, most clustering methods often fail to efficiently cluster patients due to the challenges imposed by high-throughput genomic data and its non-linearity. In this paper, we propose a pathway-based deep clustering method (PACL) for molecular subtyping of cancer, which incorporates gene expression and biological pathway database to group patients into cancer subtypes. The main contribution of our model is to discover high-level representations of biological data by learning complex hierarchical and nonlinear effects of pathways. We compared the performance of our model with a number of benchmark clustering methods that recently have been proposed in cancer subtypes. We assessed the hypothesis that clusters (subtypes) may be associated to different survivals by logrank tests. PACL showed the lowest p-value of the logrank test against the benchmark methods. It demonstrates the patient groups clustered by PACL may correspond to subtypes which are significantly associated with distinct survival distributions. Moreover, PACL provides a solution to comprehensively identify subtypes and interpret the model in the biological pathway level. The open-source software of PACL in PyTorch is publicly available at https://github.com/tmallava/PACL.

摘要

癌症是一种遗传疾病，包含多个具有不同分子特征和临床特征的亚型。癌症分型有助于改善个性化治疗和决策，因为不同的癌症亚型对治疗的反应不同。越来越多的癌症相关基因组数据为识别分子亚型提供了机会。已经应用了几种无监督机器学习技术对肿瘤样本的分子数据进行分析，以识别在遗传和临床上不同的癌症亚型。然而，由于高通量基因组数据及其非线性带来的挑战，大多数聚类方法往往无法有效地对患者进行聚类。在本文中，我们提出了一种基于通路的深度学习聚类方法（PACL），用于癌症的分子分型，该方法将基因表达和生物通路数据库相结合，将患者分为癌症亚型。我们模型的主要贡献是通过学习通路的复杂层次和非线性效应，发现生物数据的高级表示。我们将我们的模型与最近在癌症亚型中提出的一些基准聚类方法进行了性能比较。我们评估了这样一个假设，即聚类（亚型）可能与不同的存活率相关，通过对数秩检验进行检验。PACL 显示出对数秩检验中针对基准方法的最低 p 值。这表明通过 PACL 聚类的患者组可能对应于与不同生存分布显著相关的亚型。此外，PACL 提供了一种全面识别亚型并在生物通路层面解释模型的解决方案。PACL 的 PyTorch 开源软件可在 https://github.com/tmallava/PACL 上获得。

相似文献

Pathway-based deep clustering for molecular subtyping of cancer.基于通路的深度聚类在癌症分子分型中的应用。

Methods. 2020 Feb 15;173:24-31. doi: 10.1016/j.ymeth.2019.06.017. Epub 2019 Jun 25.

Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data.Subtype-DCC：基于多组学数据的用于癌症亚型识别的解耦对比聚类方法。

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad025.

A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression.基于基因表达的网络辅助协同聚类算法发现癌症亚型。

BMC Bioinformatics. 2014 Feb 4;15:37. doi: 10.1186/1471-2105-15-37.

Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus.通过具有增强一致性的多视图聚类，从基因组规模的异质TCGA数据集中进行亚型识别。

BMC Med Genomics. 2017 Dec 21;10(Suppl 4):75. doi: 10.1186/s12920-017-0306-x.

PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data.PathME：基于通路的多模态稀疏自动编码器，用于对患者层面多组学数据进行聚类。

BMC Bioinformatics. 2020 Apr 16;21(1):146. doi: 10.1186/s12859-020-3465-2.

Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.使用低秩近似的多组学数据快速降维和整合聚类：在癌症分子分类中的应用

BMC Genomics. 2015 Dec 1;16:1022. doi: 10.1186/s12864-015-2223-8.

A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping.基于随机游走的聚类集成方法在数据集成和癌症分型中的应用。

Genes (Basel). 2019 Jan 18;10(1):66. doi: 10.3390/genes10010066.

Subtype-MGTP: a cancer subtype identification framework based on multi-omics translation.基于多组学翻译的癌症亚型识别框架 Subtype-MGTP

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae360.

Supervised Graph Clustering for Cancer Subtyping Based on Survival Analysis and Integration of Multi-Omic Tumor Data.基于生存分析和多组学肿瘤数据整合的癌症亚型有监督图聚类。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1193-1202. doi: 10.1109/TCBB.2020.3010509. Epub 2022 Apr 1.

Robust clustering of noisy high-dimensional gene expression data for patients subtyping.对噪声高维基因表达数据进行稳健聚类，以对患者进行亚型划分。

Bioinformatics. 2018 Dec 1;34(23):4064-4072. doi: 10.1093/bioinformatics/bty502.

引用本文的文献

Strategies to include prior knowledge in omics analysis with deep neural networks.在组学分析中利用深度神经网络纳入先验知识的策略。

Patterns (N Y). 2025 Mar 14;6(3):101203. doi: 10.1016/j.patter.2025.101203.

Role of AI in empowering and redefining the oncology care landscape: perspective from a developing nation.人工智能在赋能和重新定义肿瘤护理格局中的作用：来自一个发展中国家的视角。

Front Digit Health. 2025 Mar 4;7:1550407. doi: 10.3389/fdgth.2025.1550407. eCollection 2025.

Prognostic heterogeneity of Ki67 in non-small cell lung cancer: A comprehensive reappraisal on immunohistochemistry and transcriptional data.Ki67 在非小细胞肺癌中的预后异质性：免疫组织化学和转录组数据的综合再评估。

J Cell Mol Med. 2024 Jul;28(14):e18521. doi: 10.1111/jcmm.18521.

PAGER-scFGA: unveiling cell functions and molecular mechanisms in cell trajectories through single-cell functional genomics analysis.PAGER-scFGA：通过单细胞功能基因组学分析揭示细胞轨迹中的细胞功能和分子机制。

Front Bioinform. 2024 Apr 16;4:1336135. doi: 10.3389/fbinf.2024.1336135. eCollection 2024.

Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data.Autosurv：用于癌症生存分析的可解释深度学习框架，整合临床和多组学数据。

NPJ Precis Oncol. 2024 Jan 5;8(1):4. doi: 10.1038/s41698-023-00494-6.

Risk stratification and pathway analysis based on graph neural network and interpretable algorithm.基于图神经网络和可解释算法的风险分层和路径分析。

BMC Bioinformatics. 2022 Sep 27;23(1):394. doi: 10.1186/s12859-022-04950-1.

Ten quick tips for biomarker discovery and validation analyses using machine learning.使用机器学习进行生物标志物发现与验证分析的十条快速提示。

PLoS Comput Biol. 2022 Aug 11;18(8):e1010357. doi: 10.1371/journal.pcbi.1010357. eCollection 2022 Aug.

DEMA: a distance-bounded energy-field minimization algorithm to model and layout biomolecular networks with quantitative features.DEMA：一种具有定量特征的生物分子网络建模和布局的距离受限能量场最小化算法。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i359-i368. doi: 10.1093/bioinformatics/btac261.

Artificial intelligence and machine learning in precision and genomic medicine.人工智能和机器学习在精准医学和基因组医学中的应用。

Med Oncol. 2022 Jun 15;39(8):120. doi: 10.1007/s12032-022-01711-1.

PAGER Web APP: An Interactive, Online Gene Set and Network Interpretation Tool for Functional Genomics.PAGER网络应用程序：一种用于功能基因组学的交互式在线基因集和网络解释工具。

Front Genet. 2022 Apr 12;13:820361. doi: 10.3389/fgene.2022.820361. eCollection 2022.

本文引用的文献

Anaplastic Lymphoma Kinase in Glioblastoma: Detection/Diagnostic Methods and Therapeutic Options.胶质母细胞瘤中的间变性淋巴瘤激酶：检测/诊断方法与治疗选择

Recent Pat Anticancer Drug Discov. 2018;13(2):209-223. doi: 10.2174/1574892813666180115151554.

Epidermal growth factor receptor and EGFRvIII in glioblastoma: signaling pathways and targeted therapies.表皮生长因子受体和 EGFRvIII 在胶质母细胞瘤中的作用：信号通路和靶向治疗。

Oncogene. 2018 Mar;37(12):1561-1575. doi: 10.1038/s41388-017-0045-7. Epub 2018 Jan 11.

An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.一种增强型确定性 K-Means 聚类算法，用于从基因表达数据中预测癌症亚型。

Comput Biol Med. 2017 Dec 1;91:213-221. doi: 10.1016/j.compbiomed.2017.10.014. Epub 2017 Oct 23.

Downregulation of type 3 inositol (1,4,5)-trisphosphate receptor decreases breast cancer cell migration through an oscillatory Ca signal.3型肌醇（1,4,5）-三磷酸受体的下调通过振荡性钙信号降低乳腺癌细胞的迁移能力。

Oncotarget. 2017 Aug 18;8(42):72324-72341. doi: 10.18632/oncotarget.20327. eCollection 2017 Sep 22.

Epidermal growth factor receptor in glioblastoma.胶质母细胞瘤中的表皮生长因子受体

Oncol Lett. 2017 Jul;14(1):512-516. doi: 10.3892/ol.2017.6221. Epub 2017 May 22.

The potential roles of aquaporin 4 in malignant gliomas.水通道蛋白4在恶性胶质瘤中的潜在作用。

Oncotarget. 2017 May 9;8(19):32345-32355. doi: 10.18632/oncotarget.16017.

Subtypes of Ovarian Cancer and Ovarian Cancer Screening.卵巢癌的亚型与卵巢癌筛查

Diagnostics (Basel). 2017 Mar 2;7(1):12. doi: 10.3390/diagnostics7010012.

Glioblastoma Multiforme: A Review of its Epidemiology and Pathogenesis through Clinical Presentation and Treatment.多形性胶质母细胞瘤：通过临床表现和治疗对其流行病学及发病机制的综述

Asian Pac J Cancer Prev. 2017 Jan 1;18(1):3-9. doi: 10.22034/APJCP.2017.18.1.3.

Role of Aquaporin 1 Signalling in Cancer Development and Progression.水通道蛋白1信号在癌症发生发展中的作用

Int J Mol Sci. 2017 Jan 29;18(2):299. doi: 10.3390/ijms18020299.

A blood-based gene expression and signaling pathway analysis to differentiate between high and low grade gliomas.一项基于血液的基因表达和信号通路分析，用于区分高级别和低级别胶质瘤。

Oncol Rep. 2017 Jan;37(1):10-22. doi: 10.3892/or.2016.5285. Epub 2016 Nov 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验