Chen Jie, Zhang Bo, Cheng Yong, Jia Yuanchen, Zhou Biao
Department of Ultrasound, China-Japan Friendship Hospital, Beijing 100029, China.
School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China.
Diagnostics (Basel). 2025 Apr 25;15(9):1096. doi: 10.3390/diagnostics15091096.
: We aimed to develop and validate machine learning (ML) models that integrate clinical and laboratory data for the non-invasive prediction of metabolic dysfunction-associated steatohepatitis (MASH) in an obese population. : In this retrospective study, clinical and laboratory data were collected from obese patients undergoing bariatric surgery. The cohort was divided using stratified random sampling, and optimal features were selected with SHapley Additive exPlanations (SHAP). Various ML models, including K-nearest neighbors, linear support vector machine, radial basis function support vector machine, Gaussian process, random forest, multilayer perceptron, adaptive boosting, and naïve Bayes, were developed through cross-validation and hyperparameter tuning. Diagnostic performance was assessed via the area under the curve (AUC) in both training and validation sets. : A total of 558 patients were analyzed, with 390 in the training set and 168 in the validation set. In the training cohort, the median age was 35 years, the median body mass index (BMI) was 39.8 kg/m, 39.0% were male, 37.9% had diabetes mellitus, and 62.8% were diagnosed with MASH. The validation cohort had a median age of 34.1 years, a median BMI of 42.5 kg/m, 41.7% male, 32.7% with diabetes, and 39.9% with MASH. Among the models, the random forest achieved the highest performance among the models with AUC values of 0.94 in the training set and 0.88 in the validation set. The Gaussian process model attained an AUC of 0.97 in the training cohort but 0.79 in the validation cohort, while the other models achieved AUC values ranging from 0.63 to 0.88 in the training cohort and 0.62 to 0.75 in the validation set. : ML models, particularly the random forest, effectively predict MASH using readily available data, offering a promising non-invasive alternative to conventional serological scoring. Prospective studies and external validations are needed to further establish clinical utility.
我们旨在开发并验证机器学习(ML)模型,该模型整合临床和实验室数据,用于对肥胖人群的代谢功能障碍相关脂肪性肝炎(MASH)进行无创预测。
在这项回顾性研究中,收集了接受减肥手术的肥胖患者的临床和实验室数据。使用分层随机抽样对队列进行划分,并通过SHapley加性解释(SHAP)选择最佳特征。通过交叉验证和超参数调整,开发了各种ML模型,包括K近邻、线性支持向量机、径向基函数支持向量机、高斯过程、随机森林、多层感知器、自适应提升和朴素贝叶斯。通过训练集和验证集中的曲线下面积(AUC)评估诊断性能。
共分析了558例患者,其中训练集390例,验证集168例。在训练队列中,中位年龄为35岁,中位体重指数(BMI)为39.8kg/m²,39.0%为男性,37.9%患有糖尿病,62.8%被诊断为MASH。验证队列的中位年龄为34.1岁,中位BMI为42.5kg/m²,41.7%为男性,32.7%患有糖尿病,39.9%患有MASH。在这些模型中,随机森林在模型中表现最佳,训练集的AUC值为0.94,验证集的AUC值为0.88。高斯过程模型在训练队列中的AUC为0.97,但在验证队列中为0.79,而其他模型在训练队列中的AUC值在0.63至0.88之间,在验证集中的AUC值在0.62至0.75之间。
ML模型,尤其是随机森林,能够利用现成的数据有效预测MASH,为传统血清学评分提供了一种有前景的无创替代方法。需要进行前瞻性研究和外部验证以进一步确立其临床实用性。