使用双向生成对抗网络对电子健康记录数据进行并发插补和预测:用于电子健康记录插补和预测的双向生成对抗网络
Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction.
作者信息
Gupta Mehak, Bunnell H Timothy, Phan Thao-Ly T, Beheshti Rahmatollah
机构信息
University of Delaware Newark, Delaware, USA.
Nemours Children's Health System Willmington, Delaware, USA.
出版信息
ACM BCB. 2021 Aug;2021. doi: 10.1145/3459930.3469512.
Working with electronic health records (EHRs) is known to be challenging due to several reasons. These reasons include not having: 1) similar lengths (per visit), 2) the same number of observations (per patient), and 3) complete entries in the available records. These issues hinder the performance of the predictive models created using EHRs. In this paper, we approach these issues by presenting a model for the combined task of imputing and predicting values for the irregularly observed and varying length EHR data with missing entries. Our proposed model (dubbed as Bi-GAN) uses a bidirectional recurrent network in a generative adversarial setting. In this architecture, the generator is a bidirectional recurrent network that receives the EHR data and imputes the existing missing values. The discriminator attempts to discriminate between the actual and the imputed values generated by the generator. Using the input data in its entirety, Bi-GAN learns how to impute missing elements in-between (imputation) or outside of the input time steps (prediction). Our method has three advantages to the state-of-the-art methods in the field: (a) one single model performs both the imputation and prediction tasks; (b) the model can perform predictions using time-series of varying length with missing data; (c) it does not require to know the observation and prediction time window during training and can be used for the predictions with different observation and prediction window lengths, for short- and long-term predictions. We evaluate our model on two large EHR datasets to impute and predict body mass index (BMI) values and show its superior performance in both settings.
由于多种原因,处理电子健康记录(EHR)具有挑战性。这些原因包括:1)(每次就诊)记录长度不同;2)(每位患者)观察数据数量不同;3)现有记录中的条目不完整。这些问题阻碍了使用电子健康记录创建的预测模型的性能。在本文中,我们针对这些问题提出了一个模型,用于对不规则观察且长度变化的带有缺失条目的电子健康记录数据进行插补和预测值的联合任务。我们提出的模型(称为双向生成对抗网络,Bi-GAN)在生成对抗设置中使用双向循环网络。在这种架构中,生成器是一个双向循环网络,它接收电子健康记录数据并插补现有的缺失值。判别器试图区分实际值和生成器生成的插补值。Bi-GAN利用全部输入数据学习如何在输入时间步之间(插补)或之外(预测)插补缺失元素。我们的方法相对于该领域的现有方法具有三个优点:(a)一个单一模型同时执行插补和预测任务;(b)该模型可以使用带有缺失数据的不同长度的时间序列进行预测;(c)在训练期间不需要知道观察和预测时间窗口,并且可用于具有不同观察和预测窗口长度的预测,包括短期和长期预测。我们在两个大型电子健康记录数据集上评估我们的模型,以插补和预测体重指数(BMI)值,并展示其在两种情况下的优越性能。