Ellrott Kyle, Wong Christopher K, Yau Christina, Castro Mauro A A, Lee Jordan A, Karlberg Brian J, Grewal Jasleen K, Lagani Vincenzo, Tercan Bahar, Friedl Verena, Hinoue Toshinori, Uzunangelov Vladislav, Westlake Lindsay, Loinaz Xavier, Felau Ina, Wang Peggy I, Kemal Anab, Caesar-Johnson Samantha J, Shmulevich Ilya, Lazar Alexander J, Tsamardinos Ioannis, Hoadley Katherine A, Robertson A Gordon, Knijnenburg Theo A, Benz Christopher C, Stuart Joshua M, Zenklusen Jean C, Cherniack Andrew D, Laird Peter W
Oregon Health and Science University, Portland, OR 97239, USA.
Biomolecular Engineering Department, School of Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.
Cancer Cell. 2025 Feb 10;43(2):195-212.e11. doi: 10.1016/j.ccell.2024.12.002. Epub 2025 Jan 2.
Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.
分子亚型,如由癌症基因组图谱(TCGA)所定义的那样,描绘了癌症的潜在生物学特性,为了解患者的预后和治疗方案带来了希望。然而,在发现亚型时所使用的大多数方法并不适用于为来自其他研究或临床试验的新癌症标本分配亚型标签。在此,我们通过将五种不同的机器学习方法应用于来自8791个TCGA肿瘤样本的多组学数据来解决这一障碍,这些样本包含来自26个不同癌症队列的106个亚型,以基于少量特征构建模型,这些模型可以将新样本分类到先前定义的TCGA分子亚型中——这是迈向分子亚型在临床中应用的一步。我们使用外部数据集验证选定的分类器。预测性能和分类器选择的特征有助于深入了解不同的机器学习方法和基因组数据平台。对于每种癌症和数据类型,我们提供表现最佳模型的容器化版本作为公共资源。