University of Southern California, Department of Computer Science, Los Angeles, CA, 90089, USA.
New York University, Department of Computer Science, New York, NY, 10012, USA.
Sci Rep. 2018 Apr 17;8(1):6085. doi: 10.1038/s41598-018-24271-9.
Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.
在实际应用中,例如医疗保健、地球科学和生物学,多元时间序列数据的特点是存在各种缺失值。在时间序列预测和其他相关任务中,已经注意到缺失值及其缺失模式通常与目标标签(即信息缺失)相关。利用缺失模式进行有效插补和提高预测性能的工作非常有限。在本文中,我们开发了新的深度学习模型,即 GRU-D,作为早期尝试之一。GRU-D 基于门控循环单元 (GRU),这是一种最先进的递归神经网络。它采用了两种缺失模式的表示,即掩蔽和时间间隔,并将它们有效地合并到一个深度模型架构中,从而不仅可以捕捉时间序列中的长期时间依赖性,还可以利用缺失模式来实现更好的预测结果。在真实临床数据集(MIMIC-III、PhysioNet)和合成数据集上的时间序列分类任务实验表明,我们的模型实现了最先进的性能,并为更好地理解和利用时间序列分析中的缺失值提供了有用的见解。