Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China.
Front Public Health. 2022 Apr 4;10:846118. doi: 10.3389/fpubh.2022.846118. eCollection 2022.
Non-alcoholic fatty liver disease (NAFLD) is a common serious health problem worldwide, which lacks efficient medical treatment. We aimed to develop and validate the machine learning (ML) models which could be used to the accurate screening of large number of people. This paper included 304,145 adults who have joined in the national physical examination and used their questionnaire and physical measurement parameters as model's candidate covariates. Absolute shrinkage and selection operator (LASSO) was used to feature selection from candidate covariates, then four ML algorithms were used to build the screening model for NAFLD, used a classifier with the best performance to output the importance score of the covariate in NAFLD. Among the four ML algorithms, XGBoost owned the best performance (accuracy = 0.880, precision = 0.801, recall = 0.894, F-1 = 0.882, and AUC = 0.951), and the importance ranking of covariates is accordingly BMI, age, waist circumference, gender, type 2 diabetes, gallbladder disease, smoking, hypertension, dietary status, physical activity, oil-loving and salt-loving. ML classifiers could help medical agencies achieve the early identification and classification of NAFLD, which is particularly useful for areas with poor economy, and the covariates' importance degree will be helpful to the prevention and treatment of NAFLD.
非酒精性脂肪性肝病(NAFLD)是一种常见的全球性严重健康问题,目前缺乏有效的治疗方法。我们旨在开发和验证机器学习(ML)模型,以便能够对大量人群进行准确筛查。本研究纳入了 304145 名参加全国体检的成年人,使用他们的问卷和身体测量参数作为模型的候选协变量。使用绝对收缩和选择算子(LASSO)从候选协变量中进行特征选择,然后使用四种 ML 算法构建 NAFLD 筛查模型,使用性能最佳的分类器输出协变量在 NAFLD 中的重要性得分。在这四种 ML 算法中,XGBoost 的性能最佳(准确率=0.880,精密度=0.801,召回率=0.894,F1 分数=0.882,AUC=0.951),协变量的重要性排序依次为 BMI、年龄、腰围、性别、2 型糖尿病、胆囊疾病、吸烟、高血压、饮食状况、身体活动、爱吃油腻和咸食。ML 分类器可以帮助医疗机构实现 NAFLD 的早期识别和分类,这对于经济欠发达地区尤为有用,协变量的重要性程度将有助于 NAFLD 的预防和治疗。