Lee Seungyeon, Yin Changchang, Zhang Ping
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA.
Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA.
Patterns (N Y). 2023 Aug 22;4(9):100828. doi: 10.1016/j.patter.2023.100828. eCollection 2023 Sep 8.
The availability of large-scale electronic health record datasets has led to the development of artificial intelligence (AI) methods for clinical risk prediction that help improve patient care. However, existing studies have shown that AI models suffer from severe performance decay after several years of deployment, which might be caused by various temporal dataset shifts. When the shift occurs, we have access to large-scale pre-shift data and small-scale post-shift data that are not enough to train new models in the post-shift environment. In this study, we propose a new method to address the issue. We reweight patients from the pre-shift environment to mitigate the distribution shift between pre- and post-shift environments. Moreover, we adopt a Kullback-Leibler divergence loss to force the models to learn similar patient representations in pre- and post-shift environments. Our experimental results show that our model efficiently mitigates temporal shifts, improving prediction performance.
大规模电子健康记录数据集的可用性推动了用于临床风险预测的人工智能(AI)方法的发展,这些方法有助于改善患者护理。然而,现有研究表明,AI模型在部署几年后会出现严重的性能衰退,这可能是由各种时间数据集偏移引起的。当偏移发生时,我们可以获得大规模的偏移前数据和小规模的偏移后数据,这些数据不足以在偏移后的环境中训练新模型。在本研究中,我们提出了一种新方法来解决这个问题。我们对偏移前环境中的患者进行重新加权,以减轻偏移前和偏移后环境之间的分布偏移。此外,我们采用库尔贝克-莱布勒散度损失来迫使模型在偏移前和偏移后环境中学习相似的患者表示。我们的实验结果表明,我们的模型有效地减轻了时间偏移,提高了预测性能。