Lim Daniel Yan Zheng, Chung Goh Eun, Cher Pei Hua, Chockalingam Ramasamy, Kim Won, Tan Chee Kiat
Health Service Research Unit, Medical Board, Singapore General Hospital, Singapore, Singapore.
Department of Gastroenterology and Hepatology, Singapore General Hospital, Singapore, Singapore.
Gastro Hep Adv. 2024 Jun 21;3(7):1005-1011. doi: 10.1016/j.gastha.2024.06.007. eCollection 2024.
Nonalcoholic fatty liver disease (NAFLD) is one of the most common liver diseases. There are no universally accepted models that accurately predict time to onset of NAFLD. Machine learning (ML) models may allow prediction of such time-to-event (ie, survival) outcomes. This study aims to develop and independently validate ML-derived models to allow personalized prediction of time to onset of NAFLD in individuals who have no NAFLD at baseline.
The development dataset comprised 25,599 individuals from a South Korean NAFLD registry. A random 70:30 split divided it into training and internal validation sets. ML survival models (random survival forest, extra survival trees) were fitted, with time to NAFLD diagnosis in months as the target variable and routine anthropometric and laboratory parameters as predictors. The independent validation dataset comprised 16,173 individuals from a Chinese open dataset. Models were evaluated using the concordance index (c-index) and Brier score on both the internal and independent validation sets.
The datasets (development vs independent validation) had 1,331,107 vs 543,874 person months of follow-up, NAFLD incidence of 25.7% (6584 individuals) vs 14.4% (2322 individuals), and median time to NAFLD onset of 60 (interquartile range 38-75) vs 24 (interquartile range 13-37) months, respectively. The ML models achieved a good c-index of >0.7 in the validation cohort-random survival forest 0.751 (95% confidence interval 0.742-0.759), extra survival trees 0.752 (95% confidence interval 0.744-0.762).
ML models can predict time-to-onset of NAFLD based on routine patient data. They can be used by clinicians to deliver personalized predictions to patients, which may facilitate patient counseling and clinical decision making on interval imaging timing.
非酒精性脂肪性肝病(NAFLD)是最常见的肝脏疾病之一。目前尚无被广泛接受的能准确预测NAFLD发病时间的模型。机器学习(ML)模型或许能够预测此类事件发生时间(即生存)结局。本研究旨在开发并独立验证基于ML的模型,以实现对基线时无NAFLD个体的NAFLD发病时间进行个性化预测。
开发数据集包含来自韩国NAFLD登记处的25599名个体。按70:30随机划分,将其分为训练集和内部验证集。拟合ML生存模型(随机生存森林、额外生存树),以NAFLD诊断时间(月)作为目标变量,常规人体测量和实验室参数作为预测因子。独立验证数据集包含来自中国开放数据集的16173名个体。使用一致性指数(c指数)和Brier评分在内部验证集和独立验证集上对模型进行评估。
数据集(开发集与独立验证集)的随访人月数分别为1331107和543874,NAFLD发病率分别为25.7%(6584例个体)和14.4%(2322例个体),NAFLD发病的中位时间分别为60(四分位间距38 - 75)和24(四分位间距13 - 37)个月。ML模型在验证队列中实现了良好的c指数,随机生存森林为0.751(95%置信区间0.742 - 0.759),额外生存树为0.752(95%置信区间0.744 - 0.762)。
ML模型可基于常规患者数据预测NAFLD的发病时间。临床医生可使用这些模型为患者提供个性化预测,这可能有助于患者咨询以及关于间隔成像时机的临床决策。