用于泛癌预后预测的自归一化多组学神经网络

Self-Normalizing Multi-Omics Neural Network for Pan-Cancer Prognostication.

作者信息

Waqas Asim, Tripathi Aakash, Ahmed Sabeen, Mukund Ashwin, Farooq Hamza, Johnson Joseph O, Stewart Paul A, Naeini Mia, Schabath Matthew B, Rasool Ghulam

机构信息

Department of Cancer Epidemiology, Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA.

Department of Machine Learning, Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA.

出版信息

Int J Mol Sci. 2025 Jul 30;26(15):7358. doi: 10.3390/ijms26157358.

DOI:10.3390/ijms26157358

PMID:40806487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12347193/

Abstract

Prognostic markers such as overall survival (OS) and tertiary lymphoid structure (TLS) ratios, alongside diagnostic signatures like primary cancer-type classification, provide critical information for treatment selection, risk stratification, and longitudinal care planning across the oncology continuum. However, extracting these signals solely from sparse, high-dimensional multi-omics data remains a major challenge due to heterogeneity and frequent missingness in patient profiles. To address this challenge, we present SeNMo, a self-normalizing deep neural network trained on five heterogeneous omics layers-gene expression, DNA methylation, miRNA abundance, somatic mutations, and protein expression-along with the clinical variables, that learns a unified representation robust to missing modalities. Trained on more than 10,000 patient profiles across 32 tumor types from The Cancer Genome Atlas (TCGA), SeNMo provides a baseline that can be readily fine-tuned for diverse downstream tasks. On a held-out TCGA test set, the model achieved a concordance index of 0.758 for OS prediction, while external evaluation yielded 0.73 on the CPTAC lung squamous cell carcinoma cohort and 0.66 on an independent 108-patient Moffitt Cancer Center cohort. Furthermore, on Moffitt's cohort, baseline SeNMo fine-tuned for TLS ratio prediction aligned with expert annotations ( < 0.05) and sharply separated high- versus low-TLS groups, reflecting distinct survival outcomes. Without altering the backbone, a single linear head classified primary cancer type with 99.8% accuracy across the 33 classes. By unifying diagnostic and prognostic predictions in a modality-robust architecture, SeNMo demonstrated strong performance across multiple clinically relevant tasks, including survival estimation, cancer classification, and TLS ratio prediction, highlighting its translational potential for multi-omics oncology applications.

摘要

诸如总生存期（OS）和三级淋巴结构（TLS）比率等预后标志物，以及像原发性癌症类型分类这样的诊断特征，为整个肿瘤治疗过程中的治疗选择、风险分层和长期护理规划提供了关键信息。然而，由于患者资料的异质性和频繁的缺失值，仅从稀疏的高维多组学数据中提取这些信号仍然是一项重大挑战。为应对这一挑战，我们提出了SeNMo，这是一种自归一化深度神经网络，它在五个异质组学层（基因表达、DNA甲基化、miRNA丰度、体细胞突变和蛋白质表达）以及临床变量上进行训练，能够学习到对缺失模态具有鲁棒性的统一表示。SeNMo在来自癌症基因组图谱（TCGA）的32种肿瘤类型的10000多个患者资料上进行训练，提供了一个可以很容易针对各种下游任务进行微调的基线。在一个保留的TCGA测试集上，该模型在OS预测方面的一致性指数达到了0.758，而外部评估在CPTAC肺鳞状细胞癌队列中得到了0.73的结果，在一个独立的108名患者的莫菲特癌症中心队列中得到了0.66的结果。此外，在莫菲特队列中，针对TLS比率预测进行微调的基线SeNMo与专家注释一致（<0.05），并将高TLS组与低TLS组明显分开，反映了不同的生存结果。在不改变主干的情况下，一个单一的线性头在33个类别中对原发性癌症类型进行分类，准确率达到99.8%。通过在一个模态鲁棒的架构中统一诊断和预后预测，SeNMo在多个临床相关任务中表现出强大的性能，包括生存估计、癌症分类和TLS比率预测，突出了其在多组学肿瘤学应用中的转化潜力。