Zhang Qingsong, Liu Fei, Lai Xin
School of Software Engineering, South China University of Technology, Guangzhou, 510006, China.
Systems and Network Medicine Lab, Biomedicine Unit, Faculty of Medicine and Health Technology, Tampere University, Tampere, 33520, Finland.
Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf444.
Accurate tumor subtype diagnosis is crucial for precision oncology, yet current methodologies face significant challenges. These include balancing model accuracy with interpretability and the high costs of generating multi-omics data in clinical settings. Moreover, there is a lack of validated models capable of classifying hierarchical tumor subtypes across a comprehensive pan-cancer cohort.
We present a graph neural network, HallmarkGraph, the first biologically informed model developed to classify hierarchical tumor subtypes in human cancer. Inspired by cancer hallmarks, the model's architecture integrates transcriptome profiles and gene regulatory interactions to perform multi-label classification. We evaluate the model on a comprehensive pan-cancer cohort comprising 11 476 samples from 26 primary cancers with 405 subtypes up to eight levels. The model demonstrates exceptional performance, achieving 5-fold cross-validation accuracy between 85% and 99% for tumor subtypes labeled with increasing details of genomic information. It also shows good generalizability on a validation dataset of 887 samples, assessed using three metrics that consider tumor subtypes at individual, combined, and sample levels. Benchmarking and ablation experiments show that hallmark-based embeddings slightly influence model performance, while the integrated multilayer perceptron plays a significant role in determining classifier accuracy. Additionally, we use the SHAP method to link cancer hallmarks with genes, identifying key features that influence model decisions. Our findings present a biologically informed machine learning framework capable of tracking tumor transcriptomic trajectories and distinguishing inter- and intra-tumor heterogeneity in pan-cancer. This approach holds promise for enhancing cancer diagnostics.
HallmarkGraph is accessible at https://github.com/laixn/HallmarkGraph.
准确的肿瘤亚型诊断对于精准肿瘤学至关重要,但目前的方法面临重大挑战。这些挑战包括在模型准确性与可解释性之间取得平衡,以及在临床环境中生成多组学数据的高成本。此外,缺乏能够在全面的泛癌队列中对分层肿瘤亚型进行分类的经过验证的模型。
我们提出了一种图神经网络HallmarkGraph,这是第一个为对人类癌症中的分层肿瘤亚型进行分类而开发的具有生物学信息的模型。受癌症特征启发,该模型的架构整合了转录组图谱和基因调控相互作用以进行多标签分类。我们在一个包含来自26种原发性癌症的11476个样本、多达八个层次的405种亚型的全面泛癌队列上评估该模型。该模型表现出色,对于标注有越来越详细基因组信息的肿瘤亚型,5折交叉验证准确率在85%至99%之间。在一个887个样本的验证数据集上,使用考虑个体、组合和样本水平肿瘤亚型的三个指标进行评估时,它也显示出良好的泛化能力。基准测试和消融实验表明,基于特征的嵌入对模型性能影响较小,而集成多层感知器在确定分类器准确性方面发挥着重要作用。此外,我们使用SHAP方法将癌症特征与基因联系起来,识别影响模型决策的关键特征。我们的研究结果提出了一个具有生物学信息的机器学习框架,能够追踪肿瘤转录组轨迹并区分泛癌中的肿瘤间和肿瘤内异质性。这种方法有望加强癌症诊断。
HallmarkGraph可在https://github.com/laixn/HallmarkGraph上获取。