Colak Cemil, Yagin Fatma Hilal, Algarni Abdulmohsen, Algarni Ali, Al-Hashem Fahaid, Ardigò Luca Paolo
Department of Biostatistics, and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey.
Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia.
Medicina (Kaunas). 2025 Mar 25;61(4):581. doi: 10.3390/medicina61040581.
: Breast cancer (BC) is the most common type of cancer in women, accounting for more than 30% of new female cancers each year. Although various treatments are available for BC, most cancer-related deaths are due to incurable metastases. Therefore, the early diagnosis and treatment of BC are crucial before metastasis. Mammography and ultrasonography are primarily used in the clinic for the initial identification and staging of BC; these methods are useful for general screening but have limitations in terms of sensitivity and specificity. Omics-based biomarkers, like metabolomics, can make early diagnosis much more accurate, make tracking the disease's progression more accurate, and help make personalized treatment plans that are tailored to each tumor's specific molecular profile. Metabolomics technology is a feasible and comprehensive method for early disease detection and biomarker identification at the molecular level. This research aimed to establish an interpretable predictive artificial intelligence (AI) model using plasma-based metabolomics panel data to identify potential biomarkers that distinguish BC individuals from healthy controls. : A cohort of 138 BC patients and 76 healthy controls were studied. Plasma metabolites were examined using LC-TOFMS and GC-TOFMS techniques. Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and Random Forest (RF) were evaluated using performance metrics such as Receiver Operating Characteristic-Area Under the Curve (ROC AUC), accuracy, sensitivity, specificity, and F1 score. ROC and Precision-Recall (PR) curves were generated for comparative analysis. The SHapley Additive Descriptions (SHAP) analysis evaluated the optimal prediction model for interpretability. : The RF algorithm showed improved accuracy (0.963 ± 0.043) and sensitivity (0.977 ± 0.051); however, LightGBM achieved the highest ROC AUC (0.983 ± 0.028). RF also achieved the best Precision-Recall Area under the Curve (PR AUC) at 0.989. SHAP search found glycerophosphocholine and pentosidine as the most significant discriminatory metabolites. Uracil, glutamine, and butyrylcarnitine were also among the significant metabolites. : Metabolomics biomarkers and an explainable AI (XAI)-based prediction model showed significant diagnostic accuracy and sensitivity in the detection of BC. The proposed XAI system using interpretable metabolite data can serve as a clinical decision support tool to improve early diagnosis processes.
乳腺癌(BC)是女性中最常见的癌症类型,每年新增女性癌症病例中占比超过30%。尽管针对BC有多种治疗方法,但大多数癌症相关死亡是由于无法治愈的转移。因此,BC的早期诊断和治疗在转移前至关重要。乳腺钼靶摄影和超声检查主要用于临床对BC进行初步识别和分期;这些方法对一般筛查有用,但在敏感性和特异性方面存在局限性。基于组学的生物标志物,如代谢组学,可以使早期诊断更加准确,使疾病进展的跟踪更加准确,并有助于制定针对每个肿瘤特定分子特征的个性化治疗方案。代谢组学技术是一种在分子水平上进行疾病早期检测和生物标志物识别的可行且全面的方法。本研究旨在使用基于血浆的代谢组学面板数据建立一个可解释的预测人工智能(AI)模型,以识别区分BC患者和健康对照的潜在生物标志物。
对138例BC患者和76例健康对照组成的队列进行了研究。使用液相色谱 - 飞行时间质谱(LC - TOFMS)和气相色谱 - 飞行时间质谱(GC - TOFMS)技术检测血浆代谢物。使用诸如受试者工作特征曲线下面积(ROC AUC)、准确性、敏感性、特异性和F1分数等性能指标对极端梯度提升(XGBoost)、轻量级梯度提升机(LightGBM)、自适应提升(AdaBoost)和随机森林(RF)进行评估。生成ROC曲线和精确召回率(PR)曲线进行比较分析。SHapley值加法解释(SHAP)分析评估了可解释性的最佳预测模型。
RF算法显示出更高的准确性(0.963±0.043)和敏感性(0.977±0.051);然而,LightGBM实现了最高的ROC AUC(0.983±0.028)。RF在曲线下的精确召回率面积(PR AUC)也达到了最佳的0.989。SHAP搜索发现甘油磷酸胆碱和戊糖苷是最显著的鉴别代谢物。尿嘧啶、谷氨酰胺和丁酰肉碱也在显著代谢物之列。
代谢组学生物标志物和基于可解释人工智能(XAI)的预测模型在BC检测中显示出显著的诊断准确性和敏感性。所提出的使用可解释代谢物数据的XAI系统可以作为临床决策支持工具,以改善早期诊断过程。