Shahnazari Parisa, Kavousi Kaveh, Khorshid Hamid Reza Khorram, Minuchehr Zarrin, Goliaei Bahram, M Salek Reza
Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
Sci Rep. 2025 Jul 1;15(1):21775. doi: 10.1038/s41598-025-06459-y.
This study integrates multimodal metabolomic data from three platforms-LC-MS, GC-MS, and NMR-to systematically identify biomarkers distinguishing breast cancer subtypes. A feedforward attention-based deep learning model effectively selected 99 significant metabolites, outperforming traditional static methods in classification performance and biomarker consistency. By combining data from diverse platforms, the approach captured a comprehensive metabolic profile while maintaining biological relevance. Self-organizing map analysis revealed distinct metabolic signatures for each subtype, highlighting critical pathways. Group 1 (ER/PR-positive, HER2-negative) exhibited elevated serine, tyrosine, and 2-aminoadipic acid levels, indicating enhanced amino acid metabolism supporting nucleotide synthesis and redox balance. Group 3 (triple-negative breast cancer) displayed increased TCA cycle intermediates, such as α-ketoglutarate and malate, reflecting a metabolic shift toward energy production and biosynthesis to sustain aggressive proliferation. In Group 4 (HER2-enriched), elevated phosphatidylcholines and phosphatidylethanolamines suggested upregulated mono-unsaturated phospholipid biosynthesis. The study provides a framework for leveraging multimodal data integration, attention-based feature selection, and self-organizing map analysis to identify biologically meaningful biomarkers.
本研究整合了来自液相色谱-质谱联用(LC-MS)、气相色谱-质谱联用(GC-MS)和核磁共振(NMR)三个平台的多模态代谢组学数据,以系统地识别区分乳腺癌亚型的生物标志物。基于前馈注意力的深度学习模型有效地选择了99种重要代谢物,在分类性能和生物标志物一致性方面优于传统的静态方法。通过整合来自不同平台的数据,该方法在保持生物学相关性的同时,捕捉到了全面的代谢谱。自组织映射分析揭示了每种亚型独特的代谢特征,突出了关键途径。第1组(雌激素受体/孕激素受体阳性、人表皮生长因子受体2阴性)的丝氨酸、酪氨酸和2-氨基己二酸水平升高,表明氨基酸代谢增强,支持核苷酸合成和氧化还原平衡。第3组(三阴性乳腺癌)的三羧酸循环中间产物(如α-酮戊二酸和苹果酸)增加,反映出代谢向能量产生和生物合成的转变,以维持侵袭性增殖。在第4组(人表皮生长因子受体2富集型)中,磷脂酰胆碱和磷脂酰乙醇胺升高表明单不饱和磷脂生物合成上调。该研究提供了一个利用多模态数据整合、基于注意力的特征选择和自组织映射分析来识别具有生物学意义的生物标志物的框架。