Sun Jiaye, Shao Shijun, Wan Hua, Wu Xueqing, Feng Jiamei, Gao Qingqian, Qu Wenchao, Xie Lu
Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China.
BMC Med Inform Decis Mak. 2024 Apr 22;24(1):106. doi: 10.1186/s12911-024-02499-y.
This study aims to build a machine learning (ML) model to predict the recurrence probability for postoperative non-lactating mastitis (NLM) by Random Forest (RF) and XGBoost algorithms. It can provide the ability to identify the risk of NLM recurrence and guidance in clinical treatment plan.
This study was conducted on inpatients who were admitted to the Mammary Department of Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine between July 2019 to December 2021. Inpatient data follow-up has been completed until December 2022. Ten features were selected in this study to build the ML model: age, body mass index (BMI), number of abortions, presence of inverted nipples, extent of breast mass, white blood cell count (WBC), neutrophil to lymphocyte ratio (NLR), albumin-globulin ratio (AGR) and triglyceride (TG) and presence of intraoperative discharge. We used two ML approaches (RF and XGBoost) to build models and predict the NLM recurrence risk of female patients. Totally 258 patients were randomly divided into a training set and a test set according to a 75%-25% proportion. The model performance was evaluated based on Accuracy, Precision, Recall, F1-score and AUC. The Shapley Additive Explanations (SHAP) method was used to interpret the model.
There were 48 (18.6%) NLM patients who experienced recurrence during the follow-up period. Ten features were selected in this study to build the ML model. For the RF model, BMI is the most important influence factor and for the XGBoost model is intraoperative discharge. The results of tenfold cross-validation suggest that both the RF model and the XGBoost model have good predictive performance, but the XGBoost model has a better performance than the RF model in our study. The trends of SHAP values of all features in our models are consistent with the trends of these features' clinical presentation. The inclusion of these ten features in the model is necessary to build practical prediction models for recurrence.
The results of tenfold cross-validation and SHAP values suggest that the models have predictive ability. The trend of SHAP value provides auxiliary validation in our models and makes it have more clinical significance.
本研究旨在构建一个机器学习(ML)模型,通过随机森林(RF)和XGBoost算法预测术后非哺乳期乳腺炎(NLM)的复发概率。它能够提供识别NLM复发风险的能力,并为临床治疗方案提供指导。
本研究以上海中医药大学附属曙光医院乳腺科2019年7月至2021年12月收治的住院患者为研究对象。住院患者数据随访至2022年12月。本研究选取了10个特征来构建ML模型:年龄、体重指数(BMI)、流产次数、乳头内陷情况、乳房肿块范围、白细胞计数(WBC)、中性粒细胞与淋巴细胞比值(NLR)、白蛋白与球蛋白比值(AGR)、甘油三酯(TG)以及术中引流情况。我们使用两种ML方法(RF和XGBoost)构建模型并预测女性患者的NLM复发风险。总共258例患者按照75%-25%的比例随机分为训练集和测试集。基于准确率、精确率、召回率、F1分数和AUC对模型性能进行评估。使用Shapley值加法解释(SHAP)方法对模型进行解释。
在随访期间,有48例(18.6%)NLM患者出现复发。本研究选取了10个特征来构建ML模型。对于RF模型,BMI是最重要的影响因素,而对于XGBoost模型,术中引流情况是最重要的影响因素。十折交叉验证结果表明,RF模型和XGBoost模型均具有良好的预测性能,但在本研究中XGBoost模型的性能优于RF模型。我们模型中所有特征的SHAP值趋势与这些特征的临床表现趋势一致。将这10个特征纳入模型对于构建复发的实用预测模型是必要的。
十折交叉验证结果和SHAP值表明模型具有预测能力。SHAP值趋势在我们的模型中提供了辅助验证,使其具有更多的临床意义。