Hong Ji Eun, Kim Yeon Eun, Kang Yun Soo, Choi Dong Hyeok, Ahn So Hyun, An Jeongshin
Department of Medical Science, Ewha Womans University College of Medicine, Seoul, Republic of Korea.
Ewha Womans University College of Medicine, Seoul, Republic of Korea.
Sci Rep. 2025 Sep 26;15(1):33096. doi: 10.1038/s41598-025-16790-z.
Recurrence and metastasis of breast cancer (RMBC) have a decisive impact on patient survival, necessitating reliable biomarkers for its early prediction. This study used machine learning to evaluate blood microbiome profiles as predictive biomarkers of RMBC. A retrospective predictive analysis was conducted on 288 participants, including 96 patients with breast cancer and 192 healthy controls. After 7 years of follow-up, patients were classified into disease-free survival (DFS, n = 88) and RMBC (n = 8) groups. Blood microbiome composition was analysed using 16S rRNA sequencing, followed by quality control. The Synthetic Minority Oversampling Technique (SMOTE) was employed to address class imbalance. Eleven machine learning models were trained using leave-one-out cross-validation (LOOCV) and k-fold cross-validation, and evaluated based on the area under the receiver operating characteristic curve (AUROC), recall, precision, F1-score, and Matthews correlation coefficient (MCC). Alpha diversity was significantly lower in DFS and RMBC groups than in the healthy control group (p < 0.05), while beta diversity analysis revealed distinct clustering. The random forest achieved an AUROC of 0.94, a recall of 0.81, an F1-score of 0.83, and an MCC of 0.88. Enterobacter, Bacteroides, Klebsiella, and Bifidobacterium were among the key microbial genera predicting RMBC in the top five models. Blood microbiome profiling shows potential as a non-invasive RMBC biomarker. Machine learning effectively distinguished RMBC, warranting further validation.
乳腺癌的复发和转移对患者生存有着决定性影响,因此需要可靠的生物标志物用于早期预测。本研究利用机器学习评估血液微生物组谱作为乳腺癌复发和转移的预测生物标志物。对288名参与者进行了回顾性预测分析,包括96例乳腺癌患者和192名健康对照。经过7年随访,患者被分为无病生存(DFS,n = 88)组和乳腺癌复发和转移(RMBC,n = 8)组。使用16S rRNA测序分析血液微生物组组成,随后进行质量控制。采用合成少数过采样技术(SMOTE)解决类别不平衡问题。使用留一法交叉验证(LOOCV)和k折交叉验证训练了11种机器学习模型,并根据受试者工作特征曲线下面积(AUROC)、召回率、精确率、F1分数和马修斯相关系数(MCC)进行评估。DFS组和RMBC组的α多样性显著低于健康对照组(p < 0.05),而β多样性分析显示出明显的聚类。随机森林模型的AUROC为0.94,召回率为0.81,F1分数为0.83,MCC为0.88。在排名前五的模型中,肠杆菌属、拟杆菌属、克雷伯菌属和双歧杆菌属是预测RMBC的关键微生物属。血液微生物组谱显示出作为非侵入性RMBC生物标志物的潜力。机器学习有效地鉴别了RMBC,值得进一步验证。