Ghandian Sina, Thapa Rahul, Garikipati Anurag, Barnes Gina, Green-Saxena Abigail, Calvert Jacob, Mao Qingqing, Das Ritankar
Department of Data Science Houston Texas USA.
Department of Research and Writing Houston Texas USA.
JGH Open. 2022 Mar 8;6(3):196-204. doi: 10.1002/jgh3.12716. eCollection 2022 Mar.
Non-alcoholic fatty liver (NAFL) can progress to the severe subtype non-alcoholic steatohepatitis (NASH) and/or fibrosis, which are associated with increased morbidity, mortality, and healthcare costs. Current machine learning studies detect NASH; however, this study is unique in predicting the progression of NAFL patients to NASH or fibrosis.
To utilize clinical information from NAFL-diagnosed patients to predict the likelihood of progression to NASH or fibrosis.
Data were collected from electronic health records of patients receiving a first-time NAFL diagnosis. A gradient boosted machine learning algorithm (XGBoost) as well as logistic regression (LR) and multi-layer perceptron (MLP) models were developed. A five-fold cross-validation grid search was utilized for hyperparameter optimization of variables, including maximum tree depth, learning rate, and number of estimators. Predictions of patients likely to progress to NASH or fibrosis within 4 years of initial NAFL diagnosis were made using demographic features, vital signs, and laboratory measurements.
The XGBoost algorithm achieved area under the receiver operating characteristic (AUROC) values of 0.79 for prediction of progression to NASH and 0.87 for fibrosis on both hold-out and external validation test sets. The XGBoost algorithm outperformed the LR and MLP models for both NASH and fibrosis prediction on all metrics.
It is possible to accurately identify newly diagnosed NAFL patients at high risk of progression to NASH or fibrosis. Early identification of these patients may allow for increased clinical monitoring, more aggressive preventative measures to slow the progression of NAFL and fibrosis, and efficient clinical trial enrollment.
非酒精性脂肪肝(NAFL)可进展为严重亚型非酒精性脂肪性肝炎(NASH)和/或肝纤维化,这与发病率、死亡率和医疗费用增加相关。目前的机器学习研究可检测NASH;然而,本研究在预测NAFL患者进展为NASH或肝纤维化方面具有独特性。
利用NAFL诊断患者的临床信息预测进展为NASH或肝纤维化的可能性。
从首次诊断为NAFL的患者的电子健康记录中收集数据。开发了梯度提升机器学习算法(XGBoost)以及逻辑回归(LR)和多层感知器(MLP)模型。采用五折交叉验证网格搜索对变量进行超参数优化,包括最大树深度、学习率和估计器数量。使用人口统计学特征、生命体征和实验室测量结果预测在初次诊断为NAFL后4年内可能进展为NASH或肝纤维化的患者。
在保留集和外部验证测试集上,XGBoost算法预测进展为NASH的受试者工作特征曲线下面积(AUROC)值为0.79,预测肝纤维化的AUROC值为0.87。在所有指标上,XGBoost算法在NASH和肝纤维化预测方面均优于LR和MLP模型。
有可能准确识别新诊断的有进展为NASH或肝纤维化高风险的NAFL患者。早期识别这些患者可能有助于加强临床监测、采取更积极的预防措施以减缓NAFL和肝纤维化的进展,以及高效地进行临床试验入组。