Deng Laifu, Wang Shuting, Wan Daiwei, Zhang Qi, Shen Wei, Liu Xiao, Zhang Yu
Department of General Surgery, Wuxi Medical Center of Nanjing Medical University, Wuxi, People's Republic of China.
Department of Oncology, Tengzhou Central People's Hospital, Jining Medical College, Shandong, People's Republic of China.
Int J Gen Med. 2025 Jan 31;18:509-527. doi: 10.2147/IJGM.S507013. eCollection 2025.
Gallstones (GS), a prevalent disorder of the biliary tract, markedly impair patients' quality of life. This study aims to construct predictive models employing diverse machine learning algorithms to elucidate risk factors linked to gallstone formation.
This study integrated data from the National Health and Nutrition Examination Survey (NHANES) with a cohort of 7868 participants from Wuxi People's Hospital and Wuxi Second People's Hospital, including 830 individuals diagnosed with gallstones. To develop our predictive model, we employed four algorithms-Logistic Regression, Gaussian Naive Bayes (GNB), Multi-Layer Perceptron (MLP), and Support Vector Machine (SVM). The models were validated internally through k-fold cross-validation and externally using independent datasets. Furthermore, we substantiated the link between relative fat mass (RFM) and gallstone formation by employing four logistic regression models, conducting subgroup analyses, and applying restricted cubic spline (RCS) curves.
The logistic regression algorithm demonstrated superior predictive capability for all risk factors associated with gallstone occurrence compared to other machine learning models. SHAP analysis identified RFM, weight-to-waist index (WWI), waist circumference (WC), waist-to-height ratio (WHtR), and body mass index (BMI) as prominent predictors of gallstone occurrence, with RFM emerging as the primary determinant. A fully adjusted multivariate logistic regression analysis revealed a robust positive association between RFM and gallstones. Subgroup analysis further indicated that subgroup factors did not alter the positive relationship between RFM and gallstone prevalence.
Among the four algorithmic models, logistic regression proved most effective in predicting gallstone occurrence. The model developed in this study offers clinicians a valuable tool for identifying critical prognostic factors, facilitating personalized patient monitoring and tailored management.
胆结石(GS)是一种常见的胆道疾病,严重影响患者的生活质量。本研究旨在构建采用多种机器学习算法的预测模型,以阐明与胆结石形成相关的危险因素。
本研究将美国国家健康与营养检查调查(NHANES)的数据与无锡市人民医院和无锡市第二人民医院的7868名参与者组成的队列相结合,其中包括830名被诊断为胆结石的个体。为了开发我们的预测模型,我们采用了四种算法——逻辑回归、高斯朴素贝叶斯(GNB)、多层感知器(MLP)和支持向量机(SVM)。这些模型通过k折交叉验证在内部进行验证,并使用独立数据集在外部进行验证。此外,我们通过采用四个逻辑回归模型、进行亚组分析和应用受限立方样条(RCS)曲线,证实了相对脂肪量(RFM)与胆结石形成之间的联系。
与其他机器学习模型相比,逻辑回归算法对所有与胆结石发生相关的危险因素表现出卓越的预测能力。SHAP分析确定RFM、体重与腰围指数(WWI)、腰围(WC)、腰高比(WHtR)和体重指数(BMI)是胆结石发生的主要预测因素,其中RFM是主要决定因素。一项完全调整的多变量逻辑回归分析显示RFM与胆结石之间存在强烈的正相关。亚组分析进一步表明,亚组因素并未改变RFM与胆结石患病率之间的正相关关系。
在这四种算法模型中,逻辑回归在预测胆结石发生方面被证明是最有效的。本研究开发的模型为临床医生提供了一个有价值的工具,用于识别关键的预后因素,促进个性化的患者监测和定制化管理。