Department of Bioengineering, UCLA, United States of America; UCLA Medical & Imaging Informatics (MII), United States of America.
Department of Computer Science, University of Southern California, United States of America.
J Biomed Inform. 2022 Oct;134:104168. doi: 10.1016/j.jbi.2022.104168. Epub 2022 Aug 17.
Early detection of heart failure (HF) can provide patients with the opportunity for more timely intervention and better disease management, as well as efficient use of healthcare resources. Recent machine learning (ML) methods have shown promising performance on diagnostic prediction using temporal sequences from electronic health records (EHRs). In practice, however, these models may not generalize to other populations due to dataset shift. Shifts in datasets can be attributed to a range of factors such as variations in demographics, data management methods, and healthcare delivery patterns. In this paper, we use unsupervised adversarial domain adaptation methods to adaptively reduce the impact of dataset shift on cross-institutional transfer performance. The proposed framework is validated on a next-visit HF onset prediction task using a BERT-style Transformer-based language model pre-trained with a masked language modeling (MLM) task. Our model empirically demonstrates superior prediction performance relative to non-adversarial baselines in both transfer directions on two different clinical event sequence data sources.
早期发现心力衰竭 (HF) 可以为患者提供更及时的干预和更好的疾病管理机会,以及更有效地利用医疗保健资源。最近的机器学习 (ML) 方法在使用电子健康记录 (EHR) 中的时间序列进行诊断预测方面表现出了很有前景的性能。然而,在实践中,由于数据集的偏移,这些模型可能无法推广到其他人群。数据集的偏移可以归因于一系列因素,例如人口统计学的变化、数据管理方法和医疗保健提供模式的变化。在本文中,我们使用无监督对抗性领域自适应方法自适应地减少数据集偏移对跨机构转移性能的影响。所提出的框架使用基于 BERT 风格的 Transformer 的语言模型在屏蔽语言建模 (MLM) 任务上进行预训练,在使用下一次就诊 HF 发作预测任务进行验证。我们的模型在两个不同的临床事件序列数据源的两个转移方向上相对于非对抗性基线,在实证上表现出了优越的预测性能。