Yagin Fatma Hilal, Gormez Yasin, Al-Hashem Fahaid, Ahmad Irshad, Ahmad Fuzail, Ardigò Luca Paolo
Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya, Türkiye.
Department of Management Information Systems, Faculty of Economics and Administrative Sciences, Sivas Cumhuriyet University, Sivas, Türkiye.
Front Mol Biosci. 2024 Dec 18;11:1426964. doi: 10.3389/fmolb.2024.1426964. eCollection 2024.
Breast cancer (BC) is a significant cause of morbidity and mortality in women. Although the important role of metabolism in the molecular pathogenesis of BC is known, there is still a need for robust metabolomic biomarkers and predictive models that will enable the detection and prognosis of BC. This study aims to identify targeted metabolomic biomarker candidates based on explainable artificial intelligence (XAI) for the specific detection of BC.
Data obtained after targeted metabolomics analyses using plasma samples from BC patients (n = 102) and healthy controls (n = 99) were used. Machine learning (ML) models based on raw data were developed, then feature selection methods were applied, and the results were compared. SHapley Additive exPlanations (SHAP), an XAI method, was used to clinically explain the decisions of the optimal model in BC prediction.
The results revealed that variable selection increased the performance of ML models in BC classification, and the optimal model was obtained with the logistic regression (LR) classifier after support vector machine (SVM)-SHAP-based feature selection. SHAP annotations of the LR model revealed that Leucine, isoleucine, L-alloisoleucine, norleucine, and homoserine acids were the most important potential BC diagnostic biomarkers. Combining the identified metabolite markers provided robust BC classification measures with precision, recall, and specificity of 89.50%, 88.38%, and 83.67%, respectively.
In conclusion, this study adds valuable information to the discovery of BC biomarkers and underscores the potential of targeted metabolomics-based diagnostic advances in the management of BC.
乳腺癌(BC)是女性发病和死亡的重要原因。尽管代谢在BC分子发病机制中的重要作用已为人所知,但仍需要强大的代谢组学生物标志物和预测模型来实现BC的检测和预后评估。本研究旨在基于可解释人工智能(XAI)识别用于BC特异性检测的靶向代谢组学生物标志物候选物。
使用来自BC患者(n = 102)和健康对照(n = 99)的血浆样本进行靶向代谢组学分析后获得的数据。基于原始数据开发机器学习(ML)模型,然后应用特征选择方法,并比较结果。使用XAI方法SHapley Additive exPlanations(SHAP)从临床角度解释最佳模型在BC预测中的决策。
结果显示,变量选择提高了ML模型在BC分类中的性能,在基于支持向量机(SVM)-SHAP的特征选择后,使用逻辑回归(LR)分类器获得了最佳模型。LR模型的SHAP注释显示,亮氨酸、异亮氨酸、L-别异亮氨酸、正亮氨酸和高丝氨酸是最重要的潜在BC诊断生物标志物。结合鉴定出的代谢物标志物可提供强大的BC分类指标,其精确率、召回率和特异性分别为89.50%、88.38%和83.67%。
总之,本研究为BC生物标志物的发现增添了有价值的信息,并强调了基于靶向代谢组学的诊断进展在BC管理中的潜力。