School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.
Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.
Front Cell Infect Microbiol. 2023 Dec 19;13:1289124. doi: 10.3389/fcimb.2023.1289124. eCollection 2023.
OBJECTIVES: Systemic Lupus Erythematosus (SLE) is a complex autoimmune disease that disproportionately affects women. Early diagnosis and prevention are crucial for women's health, and the gut microbiota has been found to be strongly associated with SLE. This study aimed to identify potential biomarkers for SLE by characterizing the gut microbiota landscape using feature selection and exploring the use of machine learning (ML) algorithms with significantly dysregulated microbiotas (SDMs) for early identification of SLE patients. Additionally, we used the SHapley Additive exPlanations (SHAP) interpretability framework to visualize the impact of SDMs on the risk of developing SLE in females. METHODS: Stool samples were collected from 54 SLE patients and 55 Negative Controls (NC) for microbiota analysis using 16S rRNA sequencing. Feature selection was performed using Elastic Net and Boruta on species-level taxonomy. Subsequently, four ML algorithms, namely logistic regression (LR), Adaptive Boosting (AdaBoost), Random Forest (RF), and eXtreme gradient boosting (XGBoost), were used to achieve early identification of SLE with SDMs. Finally, the best-performing algorithm was combined with SHAP to explore how SDMs affect the risk of developing SLE in females. RESULTS: Both alpha and beta diversity were found to be different in SLE group. Following feature selection, 68 and 21 microbiota were retained in Elastic Net and Boruta, respectively, with 16 microbiota overlapping between the two, i.e., SDMs for SLE. The four ML algorithms with SDMs could effectively identify SLE patients, with XGBoost performing the best, achieving Accuracy, Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, and AUC values of 0.844, 0.750, 0.938, 0.923, 0.790, and 0.930, respectively. The SHAP interpretability framework showed a complex non-linear relationship between the relative abundance of SDMs and the risk of SLE, with having the largest SHAP value. CONCLUSIONS: This study revealed dysbiosis in the gut microbiota of female SLE patients. ML classifiers combined with SDMs can facilitate early identification of female patients with SLE, particularly XGBoost. The SHAP interpretability framework provides insight into the impact of SDMs on the risk of SLE and may inform future scientific treatment for SLE.
目的:系统性红斑狼疮(SLE)是一种复杂的自身免疫性疾病,女性发病率明显更高。早期诊断和预防对女性健康至关重要,而肠道微生物群已被发现与 SLE 密切相关。本研究旨在通过特征选择描述肠道微生物群景观,探索使用机器学习(ML)算法识别具有显著失调微生物群(SDM)的 SLE 患者,从而识别 SLE 的潜在生物标志物。此外,我们使用 SHapley Additive exPlanations(SHAP)可解释性框架来可视化 SDM 对女性 SLE 发病风险的影响。
方法:采用 16S rRNA 测序对 54 名 SLE 患者和 55 名阴性对照(NC)的粪便样本进行微生物组分析。在物种分类水平上使用弹性网络和 Boruta 进行特征选择。随后,使用逻辑回归(LR)、自适应增强(AdaBoost)、随机森林(RF)和极端梯度提升(XGBoost)这四种 ML 算法来实现使用 SDM 对 SLE 的早期识别。最后,将表现最佳的算法与 SHAP 结合,以探讨 SDM 如何影响女性 SLE 发病风险。
结果:SLE 组的 alpha 和 beta 多样性均存在差异。在特征选择后,在弹性网络和 Boruta 中分别保留了 68 和 21 个微生物组,其中有 16 个微生物组重叠,即 SLE 的 SDM。使用 SDM 的四种 ML 算法可以有效地识别 SLE 患者,其中 XGBoost 的表现最佳,其准确性、敏感度、特异度、阳性预测值、阴性预测值和 AUC 值分别为 0.844、0.750、0.938、0.923、0.790 和 0.930。SHAP 可解释性框架显示 SDM 相对丰度与 SLE 发病风险之间存在复杂的非线性关系,其中丰度最高。
结论:本研究揭示了女性 SLE 患者肠道微生物群失调。ML 分类器与 SDM 相结合可以促进女性 SLE 患者的早期识别,特别是 XGBoost。SHAP 可解释性框架提供了对 SDM 对 SLE 发病风险影响的深入了解,可能为 SLE 的未来科学治疗提供信息。
BMC Med Inform Decis Mak. 2023-7-25
Clin Sci (Lond). 2019-4-15
Arch Oral Biol. 2020-3-14
Appl Environ Microbiol. 2018-1-31
Biomedicines. 2023-2-21
Front Cell Infect Microbiol. 2022
Front Med (Lausanne). 2022-7-28