Zhang Daifeng, Bian Guoqiang, Zhang Yuanbin, Xie Jiadong, Hu Chenjun
School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China.
Jiangsu Collaborative Innovation Center of Traditional Chinese Medicine in Prevention and Treatment of Tumor, Nanjing, China.
Front Genet. 2025 Jun 4;16:1610284. doi: 10.3389/fgene.2025.1610284. eCollection 2025.
Lung cancer continues to pose significant global health burdens due to its high morbidity and mortality. This study aimed to systematically integrate biomedical datasets, particularly incorporating traditional Chinese medicine (TCM)-associated multi-omics data, employing advanced deep-learning methods enhanced by graph attention mechanisms. We sought to investigate molecular mechanisms underlying stage-wise lung cancer progression and identify pivotal stage-specific biomarkers to support precise cancer staging classification.
We developed a novel multi-omics integrative model, named the Multi-Omics Lung Cancer Graph Network (MOLUNGN), based on Graph Attention Networks (GAT). Clinical datasets of non-small cell lung cancer (NSCLC), including lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), were analyzed to create omics-specific feature matrices comprising mRNA expression, miRNA mutation profiles, and DNA methylation data. MOLUNGN incorporated omics-specific GAT modules (OSGAT) combined with a Multi-Omics View Correlation Discovery Network (MOVCDN), effectively capturing intra- and inter-omics correlations. This framework enabled comprehensive classification of clinical cases into precise cancer stages, alongside the extraction of stage-specific biomarkers.
Evaluations utilizing publicly available datasets confirmed MOLUNGN's superior performance over existing methodologies. On the LUAD dataset, MOLUNGN achieved accuracy (ACC) of 0.84, Recall_weighted of 0.84, F1_weighted of 0.83, and F1_macro of 0.82. On the LUSC dataset, the model further improved, achieving ACC of 0.86, Recall_weighted of 0.86, F1_weighted of 0.85, and F1_macro of 0.84. Notably, critical stage-specific biomarkers with significant biological relevance to lung cancer progression were identified, facilitating robust gene-disease associations.
Our findings underscore the efficacy of MOLUNGN as an integrative framework in accurately classifying lung cancer stages and uncovering essential biomarkers. These biomarkers provide deep insights into lung cancer progression mechanisms and represent promising targets for future clinical validation. Integrating these biomarkers into the TCM-target-disease network enriches the understanding of TCM therapeutic potentials, laying a robust foundation for future precision medicine applications.
肺癌因其高发病率和死亡率,继续给全球带来重大的健康负担。本研究旨在系统整合生物医学数据集,特别是纳入与中医相关的多组学数据,采用由图注意力机制增强的先进深度学习方法。我们试图研究肺癌分期进展的分子机制,并识别关键的阶段特异性生物标志物,以支持精确的癌症分期分类。
我们基于图注意力网络(GAT)开发了一种新型的多组学整合模型,名为多组学肺癌图网络(MOLUNGN)。分析了非小细胞肺癌(NSCLC)的临床数据集,包括肺腺癌(LUAD)和肺鳞状细胞癌(LUSC),以创建包含mRNA表达、miRNA突变谱和DNA甲基化数据的组学特异性特征矩阵。MOLUNGN纳入了组学特异性GAT模块(OSGAT),并结合了多组学视图相关性发现网络(MOVCDN),有效捕捉组学内和组学间的相关性。该框架能够将临床病例全面分类为精确的癌症阶段,同时提取阶段特异性生物标志物。
利用公开可用数据集进行的评估证实,MOLUNGN的性能优于现有方法。在LUAD数据集上,MOLUNGN的准确率(ACC)为0.84,加权召回率为0.84,加权F1值为0.83,宏F1值为0.82。在LUSC数据集上,该模型进一步改进,ACC为0.86,加权召回率为0.86,加权F1值为0.85,宏F1值为0.84。值得注意的是,识别出了与肺癌进展具有显著生物学相关性的关键阶段特异性生物标志物,促进了强大的基因-疾病关联。
我们的研究结果强调了MOLUNGN作为一种整合框架在准确分类肺癌阶段和发现重要生物标志物方面的有效性。这些生物标志物为肺癌进展机制提供了深入见解,并代表了未来临床验证的有希望的靶点。将这些生物标志物整合到中医靶点-疾病网络中,丰富了对中医治疗潜力的理解,为未来的精准医学应用奠定了坚实基础。