Zhang Xiangzhou, Xue Yunfei, Su Xinyu, Chen Shaoyong, Liu Kang, Chen Weiqi, Liu Mei, Hu Yong
Big Data Decision Institute, Jinan University, Guangzhou, China.
College of Information Science and Technology, Jinan University, Guangzhou, China.
JMIR Med Inform. 2022 Nov 9;10(11):e38053. doi: 10.2196/38053.
Clinical prediction models suffer from performance drift as the patient population shifts over time. There is a great need for model updating approaches or modeling frameworks that can effectively use the old and new data.
Based on the paradigm of transfer learning, we aimed to develop a novel modeling framework that transfers old knowledge to the new environment for prediction tasks, and contributes to performance drift correction.
The proposed predictive modeling framework maintains a logistic regression-based stacking ensemble of 2 gradient boosting machine (GBM) models representing old and new knowledge learned from old and new data, respectively (referred to as transfer learning gradient boosting machine [TransferGBM]). The ensemble learning procedure can dynamically balance the old and new knowledge. Using 2010-2017 electronic health record data on a retrospective cohort of 141,696 patients, we validated TransferGBM for hospital-acquired acute kidney injury prediction.
The baseline models (ie, transported models) that were trained on 2010 and 2011 data showed significant performance drift in the temporal validation with 2012-2017 data. Refitting these models using updated samples resulted in performance gains in nearly all cases. The proposed TransferGBM model succeeded in achieving uniformly better performance than the refitted models.
Under the scenario of population shift, incorporating new knowledge while preserving old knowledge is essential for maintaining stable performance. Transfer learning combined with stacking ensemble learning can help achieve a balance of old and new knowledge in a flexible and adaptive way, even in the case of insufficient new data.
随着患者群体随时间变化,临床预测模型会出现性能漂移。迫切需要能够有效利用新旧数据的模型更新方法或建模框架。
基于迁移学习范式,我们旨在开发一种新颖的建模框架,将旧知识迁移到新环境中用于预测任务,并有助于纠正性能漂移。
所提出的预测建模框架维护一个基于逻辑回归的堆叠集成模型,该模型由2个梯度提升机(GBM)模型组成,分别代表从旧数据和新数据中学到的旧知识和新知识(称为迁移学习梯度提升机[TransferGBM])。集成学习过程可以动态平衡旧知识和新知识。使用2010 - 2017年141,696例患者回顾性队列的电子健康记录数据,我们对TransferGBM进行了医院获得性急性肾损伤预测的验证。
在2010年和2011年数据上训练的基线模型(即迁移模型)在使用2012 - 2017年数据进行时间验证时表现出显著的性能漂移。使用更新后的样本重新拟合这些模型在几乎所有情况下都带来了性能提升。所提出的TransferGBM模型成功实现了比重新拟合模型始终更好的性能。
在人群变化的情况下,在保留旧知识的同时纳入新知识对于保持稳定性能至关重要。迁移学习与堆叠集成学习相结合可以帮助以灵活和自适应的方式实现旧知识和新知识的平衡,即使在新数据不足的情况下也是如此。