Charu Vivek, Liang Jane W, Mannalithara Ajitha, Kwong Allison, Tian Lu, Kim W Ray
medRxiv. 2023 Aug 4:2023.08.02.23293569. doi: 10.1101/2023.08.02.23293569.
Ensemble machine learning (ML) methods can combine many individual models into a single 'super' model using an optimal weighted combination. Here we demonstrate how an underutilized ensemble model, the superlearner, can be used as a benchmark for model performance in clinical risk prediction. We illustrate this by implementing a superlearner to predict liver fibrosis in patients with non-alcoholic fatty liver disease (NAFLD).
We trained a superlearner based on 23 demographic and clinical variables, with the goal of predicting stage 2 or higher liver fibrosis. The superlearner was trained on data from the Non-alcoholic steatohepatitis - clinical research network observational study (NASH-CRN, n=648), and validated using data from participants in a randomized trial for NASH ('FLINT' trial, n=270) and data from examinees with NAFLD who participated in the National Health and Nutrition Examination Survey (NHANES, n=1244). We compared the performance of the superlearner with existing models, including FIB-4, NFS, Forns, APRI, BARD and SAFE.
In the FLINT and NHANES validation sets, the superlearner (derived from 12 base models) discriminates patients with significant fibrosis from those without well, with AUCs of 0.79 (95% CI: 0.73-0.84) and 0.74 (95% CI: 0.68-0.79). Among the existing scores considered, the SAFE score performed similarly to the superlearner, and the superlearner and SAFE scores outperformed FIB-4, APRI, Forns, and BARD scores in the validation datasets. A superlearner model derived from 12 base models performed as well as one derived from 90 base models.
The superlearner, thought of as the "best-in-class" ML prediction, performed better than most existing models commonly used in practice in detecting fibrotic NASH. The superlearner can be used to benchmark the performance of conventional clinical risk prediction models.
集成机器学习(ML)方法可以使用最优加权组合将多个个体模型组合成一个单一的“超级”模型。在此,我们展示了一种未得到充分利用的集成模型——超级学习器,如何能够用作临床风险预测中模型性能的基准。我们通过实施一个超级学习器来预测非酒精性脂肪性肝病(NAFLD)患者的肝纤维化,对此进行说明。
我们基于23个人口统计学和临床变量训练了一个超级学习器,目标是预测2期或更高阶段的肝纤维化。该超级学习器在非酒精性脂肪性肝炎临床研究网络观察性研究(NASH-CRN,n = 648)的数据上进行训练,并使用来自一项NASH随机试验参与者的数据(“FLINT”试验,n = 270)以及参与国家健康与营养检查调查(NHANES,n = 1244)的NAFLD受检者的数据进行验证。我们将超级学习器的性能与现有模型进行了比较,包括FIB-4、NFS、Forns、APRI、BARD和SAFE。
在FLINT和NHANES验证集中,超级学习器(源自12个基础模型)能够很好地区分有显著纤维化的患者和无纤维化的患者,曲线下面积(AUC)分别为0.79(95%置信区间:0.73 - 0.84)和0.74(95%置信区间:0.68 - 0.79)。在所考虑的现有评分中,SAFE评分的表现与超级学习器相似,并且在验证数据集中,超级学习器和SAFE评分优于FIB-4、APRI、Forns和BARD评分。源自12个基础模型的超级学习器模型与源自90个基础模型的模型表现相当。
被视为“同类最佳”ML预测的超级学习器,在检测纤维化NASH方面的表现优于实践中常用的大多数现有模型。超级学习器可用于衡量传统临床风险预测模型的性能。