使用紧凑特征集将非TCGA癌症样本分类为TCGA分子亚型。

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets.

作者信息

Ellrott Kyle, Wong Christopher K, Yau Christina, Castro Mauro A A, Lee Jordan A, Karlberg Brian J, Grewal Jasleen K, Lagani Vincenzo, Tercan Bahar, Friedl Verena, Hinoue Toshinori, Uzunangelov Vladislav, Westlake Lindsay, Loinaz Xavier, Felau Ina, Wang Peggy I, Kemal Anab, Caesar-Johnson Samantha J, Shmulevich Ilya, Lazar Alexander J, Tsamardinos Ioannis, Hoadley Katherine A, Robertson A Gordon, Knijnenburg Theo A, Benz Christopher C, Stuart Joshua M, Zenklusen Jean C, Cherniack Andrew D, Laird Peter W

机构信息

Oregon Health and Science University, Portland, OR 97239, USA.

Biomolecular Engineering Department, School of Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.

出版信息

Cancer Cell. 2025 Feb 10;43(2):195-212.e11. doi: 10.1016/j.ccell.2024.12.002. Epub 2025 Jan 2.

DOI:10.1016/j.ccell.2024.12.002

PMID:39753139

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11949768/

Abstract

Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.

摘要

分子亚型，如由癌症基因组图谱（TCGA）所定义的那样，描绘了癌症的潜在生物学特性，为了解患者的预后和治疗方案带来了希望。然而，在发现亚型时所使用的大多数方法并不适用于为来自其他研究或临床试验的新癌症标本分配亚型标签。在此，我们通过将五种不同的机器学习方法应用于来自8791个TCGA肿瘤样本的多组学数据来解决这一障碍，这些样本包含来自26个不同癌症队列的106个亚型，以基于少量特征构建模型，这些模型可以将新样本分类到先前定义的TCGA分子亚型中——这是迈向分子亚型在临床中应用的一步。我们使用外部数据集验证选定的分类器。预测性能和分类器选择的特征有助于深入了解不同的机器学习方法和基因组数据平台。对于每种癌症和数据类型，我们提供表现最佳模型的容器化版本作为公共资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95b5/11949768/d23c6a88dd2a/nihms-2046254-f0002.jpg

相似文献

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets.使用紧凑特征集将非TCGA癌症样本分类为TCGA分子亚型。

Cancer Cell. 2025 Feb 10;43(2):195-212.e11. doi: 10.1016/j.ccell.2024.12.002. Epub 2025 Jan 2.

New insights for precision treatment of glioblastoma from analysis of single-cell lncRNA expression.从单细胞 lncRNA 表达分析中获得胶质母细胞瘤精准治疗的新见解。

J Cancer Res Clin Oncol. 2021 Jul;147(7):1881-1895. doi: 10.1007/s00432-021-03584-9. Epub 2021 Mar 11.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA？一项初步评估。

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.基于分子特征的腹膜后脂肪肉瘤分类：一项前瞻性队列研究。

Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

XGB-BIF: An XGBoost-Driven Biomarker Identification Framework for Detecting Cancer Using Human Genomic Data.XGB-BIF：一种用于利用人类基因组数据检测癌症的基于XGBoost的生物标志物识别框架。

Int J Mol Sci. 2025 Jun 11;26(12):5590. doi: 10.3390/ijms26125590.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Hybrid classical and quantum computing for enhanced glioma tumor classification using TCGA data.利用TCGA数据的混合经典与量子计算用于增强胶质瘤肿瘤分类

Sci Rep. 2025 Jul 17;15(1):25935. doi: 10.1038/s41598-025-97067-3.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

引用本文的文献

HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes.标志性图：一种基于癌症特征的图神经网络，用于对肿瘤亚型进行分层分类。

Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf444.

APOBEC3C-Mediated NF-κB Activation Promotes Malignant Progression of Gliomas.载脂蛋白B mRNA编辑酶催化多肽样3C介导的核因子κB激活促进胶质瘤的恶性进展。

Immun Inflamm Dis. 2025 Jul;13(7):e70224. doi: 10.1002/iid3.70224.

Decoding meningioma prognosis with multi-omics: macrophage diversity, immune-CNV interplay, and novel SPP1-targeted strategies.利用多组学技术解码脑膜瘤预后：巨噬细胞多样性、免疫-拷贝数变异相互作用及新型SPP1靶向策略

J Neurooncol. 2025 Jun 16. doi: 10.1007/s11060-025-05116-8.

Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal.跨未对齐嵌入空间聚合多模态癌症数据可保留肿瘤起源信号。

bioRxiv. 2025 May 18:2025.05.14.653900. doi: 10.1101/2025.05.14.653900.

Molecular basis and therapeutic implications of binary YAPOn/YAPOff cancer classes.二元YAP开启/ YAP关闭癌症类别分子基础及治疗意义

Biochem J. 2025 May 28;482(11):741-61. doi: 10.1042/BCJ20253077.

Protocol for obtaining cancer type and subtype predictions using subSCOPE.使用subSCOPE获取癌症类型和亚型预测结果的方案。

STAR Protoc. 2025 Jun 20;6(2):103705. doi: 10.1016/j.xpro.2025.103705. Epub 2025 Apr 10.

Protocol for assessing distances in pathway space for classifier feature sets from machine learning methods.评估机器学习方法中分类器特征集在通路空间中距离的方案。

STAR Protoc. 2025 Jun 20;6(2):103681. doi: 10.1016/j.xpro.2025.103681. Epub 2025 Mar 18.

A machine learning toolkit for subtyping cancer in existing and new datasets.一种用于对现有和新数据集中的癌症进行亚型分类的机器学习工具包。

Nat Rev Cancer. 2025 May;25(5):320. doi: 10.1038/s41568-025-00802-1.

本文引用的文献

Deep-Learning Model for Tumor-Type Prediction Using Targeted Clinical Genomic Sequencing Data.基于靶向临床基因组测序数据的肿瘤类型预测深度学习模型。

Cancer Discov. 2024 Jun 3;14(6):1064-1081. doi: 10.1158/2159-8290.CD-23-0996.

Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。

Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.

FOXA1 Reprogramming Dictates Retinoid X Receptor Response in ESR1-Mutant Breast Cancer.FOXA1 重编程决定 ESR1 突变型乳腺癌的视黄酸 X 受体反应。

Mol Cancer Res. 2023 Jun 1;21(6):591-604. doi: 10.1158/1541-7786.MCR-22-0516.

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.使用基因表达数据进行癌症分类的机器学习方法：综述

Bioengineering (Basel). 2023 Jan 28;10(2):173. doi: 10.3390/bioengineering10020173.

Multiomics in primary and metastatic breast tumors from the AURORA US network finds microenvironment and epigenetic drivers of metastasis.AURORA US 网络的原发性和转移性乳腺癌的多组学研究发现了转移的微环境和表观遗传驱动因素。

Nat Cancer. 2023 Jan;4(1):128-147. doi: 10.1038/s43018-022-00491-x. Epub 2022 Dec 30.

High-Resolution Profiling of Lung Adenocarcinoma Identifies Expression Subtypes with Specific Biomarkers and Clinically Relevant Vulnerabilities.肺腺癌高分辨率分析鉴定出具有特定生物标志物和临床相关脆弱性的表达亚型。

Cancer Res. 2022 Nov 2;82(21):3917-3931. doi: 10.1158/0008-5472.CAN-22-0432.

Just Add Data: automated predictive modeling for knowledge discovery and feature selection.只需添加数据：用于知识发现和特征选择的自动预测建模

NPJ Precis Oncol. 2022 Jun 16;6(1):38. doi: 10.1038/s41698-022-00274-8.

Positive Regulation of Estrogen Receptor Alpha in Breast Tumorigenesis.雌激素受体α在乳腺癌发生中的正向调控

Cells. 2021 Oct 31;10(11):2966. doi: 10.3390/cells10112966.

Therapeutically Targeting Cancers That Overexpress FOXC1: A Transcriptional Driver of Cell Plasticity, Partial EMT, and Cancer Metastasis.治疗性靶向过表达FOXC1的癌症：细胞可塑性、部分上皮-间质转化及癌症转移的转录驱动因子

Front Oncol. 2021 Sep 3;11:721959. doi: 10.3389/fonc.2021.721959. eCollection 2021.

Conserved pan-cancer microenvironment subtypes predict response to immunotherapy.泛癌种保守的微环境亚型可预测免疫治疗的反应。

Cancer Cell. 2021 Jun 14;39(6):845-865.e7. doi: 10.1016/j.ccell.2021.04.014. Epub 2021 May 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用紧凑特征集将非TCGA癌症样本分类为TCGA分子亚型。

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献