Liu Wen-Shan, Si Tong, Kriauciunas Aldas, Snell Marcus, Gong Haijun
Department of Health and Clinical Outcomes Research, Saint Louis University, St. Louis, MO 63103, USA.
Department of Mathematics and Computer Science, Culver-Stockton College, Canton, MO 63435, USA.
Stats (Basel). 2025 Mar;8(1). doi: 10.3390/stats8010007. Epub 2025 Jan 14.
Imputing missing values in high-dimensional time-series data remains a significant challenge in statistics and machine learning. Although various methods have been proposed in recent years, many struggle with limitations and reduced accuracy, particularly when the missing rate is high. In this work, we present a novel f-divergence-based bidirectional generative adversarial imputation network, tf-BiGAIN, designed to address these challenges in time-series data imputation. Unlike traditional imputation methods, tf-BiGAIN employs a generative model to synthesize missing values without relying on distributional assumptions. The imputation process is achieved by training two neural networks, implemented using bidirectional modified gated recurrent units, with f-divergence serving as the objective function to guide optimization. Compared to existing deep learning-based methods, tf-BiGAIN introduces two key innovations. First, the use of f-divergence provides a flexible and adaptable framework for optimizing the model across diverse imputation tasks, enhancing its versatility. Second, the use of bidirectional gated recurrent units allows the model to leverage both forward and backward temporal information. This bidirectional approach enables the model to effectively capture dependencies from both past and future observations, enhancing its imputation accuracy and robustness. We applied tf-BiGAIN to analyze two real-world time-series datasets, demonstrating its superior performance in imputing missing values and outperforming existing methods in terms of accuracy and robustness.
在高维时间序列数据中插补缺失值在统计学和机器学习领域仍然是一项重大挑战。尽管近年来已经提出了各种方法,但许多方法都存在局限性且准确性降低,尤其是在缺失率较高时。在这项工作中,我们提出了一种基于新颖的f散度的双向生成对抗插补网络tf-BiGAIN,旨在解决时间序列数据插补中的这些挑战。与传统插补方法不同,tf-BiGAIN采用生成模型来合成缺失值,而不依赖于分布假设。插补过程是通过训练两个神经网络来实现的,使用双向修改门控循环单元,以f散度作为目标函数来指导优化。与现有的基于深度学习的方法相比,tf-BiGAIN引入了两个关键创新。首先,使用f散度为跨各种插补任务优化模型提供了一个灵活且适应性强的框架,增强了其通用性。其次,使用双向门控循环单元使模型能够利用向前和向后的时间信息。这种双向方法使模型能够有效地从过去和未来的观测中捕捉依赖性,提高其插补准确性和鲁棒性。我们应用tf-BiGAIN分析了两个真实世界的时间序列数据集,证明了它在插补缺失值方面的卓越性能,并且在准确性和鲁棒性方面优于现有方法。