Health Management Center, Guilin People's Hospital, Guilin, China.
Philippine Christian University, Manila, Philippines.
Sci Rep. 2023 Mar 3;13(1):3638. doi: 10.1038/s41598-023-30750-5.
Nonalcoholic fatty liver disease (NAFLD) is one of major causes of end-stage liver disease in the coming decades, but it shows few symptoms until it develops into cirrhosis. We aim to develop classification models with machine learning to screen NAFLD patients among general adults. This study included 14,439 adults who took health examination. We developed classification models to classify subjects with or without NAFLD using decision tree, random forest (RF), extreme gradient boosting (XGBoost) and support vector machine (SVM). The classifier with SVM was showed the best performance with the highest accuracy (0.801), positive predictive value (PPV) (0.795), F1 score (0.795), Kappa score (0.508) and area under the precision-recall curve (AUPRC) (0.712), and the second top of area under receiver operating characteristic curve (AUROC) (0.850). The second-best classifier was RF model, which was showed the highest AUROC (0.852) and the second top of accuracy (0.789), PPV (0.782), F1 score (0.782), Kappa score (0.478) and AUPRC (0.708). In conclusion, the classifier with SVM is the best one to screen NAFLD in general population based on the results from physical examination and blood testing, followed by the classifier with RF. Those classifiers have a potential to screen NAFLD in general population for physician and primary care doctors, which could benefit to NAFLD patients from early diagnosis.
非酒精性脂肪性肝病(NAFLD)是未来几十年导致终末期肝病的主要原因之一,但在发展为肝硬化之前,它几乎没有任何症状。我们旨在开发机器学习分类模型,以筛选一般成年人中的 NAFLD 患者。本研究纳入了 14439 名接受健康检查的成年人。我们开发了分类模型,使用决策树、随机森林(RF)、极端梯度提升(XGBoost)和支持向量机(SVM)来对有或没有 NAFLD 的受试者进行分类。SVM 分类器的表现最佳,准确率最高(0.801)、阳性预测值(PPV)最高(0.795)、F1 评分最高(0.795)、Kappa 评分最高(0.508)和精度召回曲线下面积(AUPRC)最高(0.712),ROC 曲线下面积(AUROC)排名第二(0.850)。第二好的分类器是 RF 模型,其 AUROC 最高(0.852),准确率排名第二(0.789)、PPV 排名第二(0.782)、F1 评分排名第二(0.782)、Kappa 评分第二高(0.478)和 AUPRC 排名第二(0.708)。总之,基于体检和血液检测结果,SVM 分类器是筛选一般人群中 NAFLD 的最佳分类器,其次是 RF 分类器。这些分类器具有为医生和初级保健医生筛选一般人群中 NAFLD 的潜力,这将有利于 NAFLD 患者的早期诊断。