Wu Yonghui, Yang Xi, Morris Heather L, Gurka Matthew J, Shenkman Elizabeth A, Cusi Kenneth, Bril Fernando, Donahoo William T
Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.
Target RWE Health Evidence Solutions, Durham, NC, United States.
JMIR Med Inform. 2022 Jun 6;10(6):e36997. doi: 10.2196/36997.
Nonalcoholic steatohepatitis (NASH), advanced fibrosis, and subsequent cirrhosis and hepatocellular carcinoma are becoming the most common etiology for liver failure and liver transplantation; however, they can only be diagnosed at these potentially reversible stages with a liver biopsy, which is associated with various complications and high expenses. Knowing the difference between the more benign isolated steatosis and the more severe NASH and cirrhosis informs the physician regarding the need for more aggressive management.
We intend to explore the feasibility of using machine learning methods for noninvasive diagnosis of NASH and advanced liver fibrosis and compare machine learning methods with existing quantitative risk scores.
We conducted a retrospective analysis of clinical data from a cohort of 492 patients with biopsy-proven nonalcoholic fatty liver disease (NAFLD), NASH, or advanced fibrosis. We systematically compared 5 widely used machine learning algorithms for the prediction of NAFLD, NASH, and fibrosis using 2 variable encoding strategies. Then, we compared the machine learning methods with 3 existing quantitative scores and identified the important features for prediction using the SHapley Additive exPlanations method.
The best machine learning method, gradient boosting (GB), achieved the best area under the curve scores of 0.9043, 0.8166, and 0.8360 for NAFLD, NASH, and advanced fibrosis, respectively. GB also outperformed 3 existing risk scores for fibrosis. Among the variables, alanine aminotransferase (ALT), triglyceride (TG), and BMI were the important risk factors for the prediction of NAFLD, whereas aspartate transaminase (AST), ALT, and TG were the important variables for the prediction of NASH, and AST, hyperglycemia (A), and high-density lipoprotein were the important variables for predicting advanced fibrosis.
It is feasible to use machine learning methods for predicting NAFLD, NASH, and advanced fibrosis using routine clinical data, which potentially can be used to better identify patients who still need liver biopsy. Additionally, understanding the relative importance and differences in predictors could lead to improved understanding of the disease process as well as support for identifying novel treatment options.
非酒精性脂肪性肝炎(NASH)、进展性肝纤维化以及随后的肝硬化和肝细胞癌正成为肝衰竭和肝移植最常见的病因;然而,只有通过肝活检才能在这些可能可逆的阶段进行诊断,而肝活检会引发各种并发症且费用高昂。了解较良性的单纯性脂肪变性与较严重的NASH和肝硬化之间的差异,有助于医生确定是否需要采取更积极的治疗措施。
我们旨在探讨使用机器学习方法对NASH和进展性肝纤维化进行无创诊断的可行性,并将机器学习方法与现有的定量风险评分进行比较。
我们对492例经活检证实患有非酒精性脂肪性肝病(NAFLD)、NASH或进展性肝纤维化患者的临床数据进行了回顾性分析。我们使用两种变量编码策略,系统地比较了5种广泛使用的机器学习算法对NAFLD、NASH和肝纤维化的预测能力。然后,我们将机器学习方法与3种现有的定量评分进行比较,并使用夏普利值法确定预测的重要特征。
最佳的机器学习方法梯度提升(GB)分别在NAFLD、NASH和进展性肝纤维化的预测中取得了曲线下面积得分0.9043、0.8166和0.8360的最佳成绩。GB在肝纤维化预测方面也优于3种现有的风险评分。在这些变量中,丙氨酸氨基转移酶(ALT)、甘油三酯(TG)和体重指数(BMI)是预测NAFLD的重要危险因素,而天冬氨酸氨基转移酶(AST)、ALT和TG是预测NASH的重要变量,AST、高血糖(A)和高密度脂蛋白是预测进展性肝纤维化的重要变量。
使用机器学习方法通过常规临床数据预测NAFLD、NASH和进展性肝纤维化是可行的,这可能有助于更好地识别仍需要进行肝活检的患者。此外,了解预测指标的相对重要性和差异,有助于加深对疾病过程的理解,并为确定新的治疗方案提供支持。