Vaida Maria, Wu Jiawen, Himdiat Eyad, Haince Jean-François, Bux Rashid A, Huang Guoyu, Tappia Paramjit S, Ramjiawan Bram, Ford W Rand
Department of Data Science, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA.
BioMark Diagnostic Solutions Inc., Quebec, QC G1P 4P5, Canada.
Int J Mol Sci. 2025 May 13;26(10):4655. doi: 10.3390/ijms26104655.
Lung cancer remains the leading cause of cancer-related mortality worldwide, with early detection critical for improving survival rates, yet conventional methods like CT scans often yield high false-positive rates. This study introduces M-GNN, a graph neural network framework leveraging GraphSAGE, to enhance early lung cancer detection through metabolomics. We constructed a heterogeneous graph integrating metabolomics data from 800 plasma samples (586 cases, 214 controls) with demographic features and Human Metabolome Database annotations, employing GraphSAGE and GAT layers for inductive learning on 107 metabolites, pathways, and diseases. M-GNN achieved a test accuracy of 89% and an ROC-AUC of 0.92, with rapid convergence within 400 epochs and robust performance across ten random seeds; key predictors included age, height, choline, Valine, Betaine, and Fumaric Acid, reflecting smoking and metabolic dysregulation. This framework offers a scalable, interpretable tool for precision oncology, surpassing benchmarks by capturing complex biological interactions, though limitations like synthetic data biases and computational demands suggest future validation with real-world cohorts and optimization. M-GNN advances lung cancer screening, promising improved survival through early detection and personalized strategies.
肺癌仍然是全球癌症相关死亡的主要原因,早期检测对于提高生存率至关重要,但像CT扫描这样的传统方法往往产生较高的假阳性率。本研究引入了M-GNN,这是一种利用GraphSAGE的图神经网络框架,旨在通过代谢组学增强早期肺癌检测。我们构建了一个异构图,将来自800份血浆样本(586例病例,214例对照)的代谢组学数据与人口统计学特征和人类代谢组数据库注释相结合,采用GraphSAGE和GAT层对107种代谢物、通路和疾病进行归纳学习。M-GNN实现了89%的测试准确率和0.92的ROC-AUC,在400个轮次内快速收敛,并且在十个随机种子上具有稳健的性能;关键预测因子包括年龄、身高、胆碱、缬氨酸、甜菜碱和富马酸,反映了吸烟和代谢失调。该框架为精准肿瘤学提供了一种可扩展、可解释的工具,通过捕捉复杂的生物相互作用超越了基准,尽管存在合成数据偏差和计算需求等局限性,这表明未来需要用真实世界队列进行验证和优化。M-GNN推动了肺癌筛查,有望通过早期检测和个性化策略提高生存率。