Suppr超能文献

时间序列健康数据中缺失值的深度插补:综述与基准测试。

Deep imputation of missing values in time series health data: A review with benchmarking.

机构信息

Department of Computer Science, Tennessee State University, Nashville, TN 37209, United States.

Department of Computer Science, Tennessee State University, Nashville, TN 37209, United States.

出版信息

J Biomed Inform. 2023 Aug;144:104440. doi: 10.1016/j.jbi.2023.104440. Epub 2023 Jul 8.

Abstract

The imputation of missing values in multivariate time series (MTS) data is critical in ensuring data quality and producing reliable data-driven predictive models. Apart from many statistical approaches, a few recent studies have proposed state-of-the-art deep learning methods to impute missing values in MTS data. However, the evaluation of these deep methods is limited to one or two data sets, low missing rates, and completely random missing value types. This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets. Our extensive analysis reveals that no single imputation method outperforms the others on all five data sets. The imputation performance depends on data types, individual variable statistics, missing value rates, and types. Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods. Although computationally expensive, deep learning methods are practical given the current availability of high-performance computing resources, especially when data quality and sample size are of paramount importance in healthcare informatics. Our findings highlight the importance of data-centric selection of imputation methods to optimize data-driven predictive models.

摘要

多元时间序列 (MTS) 数据中缺失值的插补对于确保数据质量和生成可靠的数据驱动预测模型至关重要。除了许多统计方法外,最近的一些研究还提出了最先进的深度学习方法来插补 MTS 数据中的缺失值。然而,这些深度方法的评估仅限于一个或两个数据集、低缺失率和完全随机的缺失值类型。本调查对五个时间序列健康数据集上的最先进的深度插补方法进行了六项数据中心实验,以进行基准测试。我们的广泛分析表明,没有一种插补方法在所有五个数据集上都优于其他方法。插补性能取决于数据类型、个别变量统计、缺失值率和类型。联合执行时间序列数据中缺失值的跨截面 (跨变量) 和纵向 (跨时间) 插补的深度学习方法比传统插补方法具有更好的统计数据质量。尽管计算成本很高,但考虑到当前高性能计算资源的可用性,深度学习方法在医疗保健信息学中数据质量和样本量至关重要的情况下是实用的。我们的研究结果强调了基于数据的插补方法选择的重要性,以优化数据驱动的预测模型。

相似文献

2
Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.电子健康记录表格数据中基于注意力机制的缺失值插补
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:177-182. doi: 10.1109/ichi61247.2024.00030. Epub 2024 Aug 22.

引用本文的文献

2
Missing data imputation of climate time series: A review.气候时间序列的缺失数据插补:综述
MethodsX. 2025 Jun 19;15:103455. doi: 10.1016/j.mex.2025.103455. eCollection 2025 Dec.
7
Augmenting Circadian Biology Research With Data Science.用数据科学增强昼夜节律生物学研究。
J Biol Rhythms. 2025 Apr;40(2):143-170. doi: 10.1177/07487304241310923. Epub 2025 Jan 29.

本文引用的文献

4
Context-Aware Time Series Imputation for Multi-Analyte Clinical Data.用于多分析物临床数据的上下文感知时间序列插补
J Healthc Inform Res. 2020 Oct 18;4(4):411-426. doi: 10.1007/s41666-020-00075-3. eCollection 2020 Dec.
10
Predicting Missing Values in Medical Data via XGBoost Regression.通过XGBoost回归预测医学数据中的缺失值。
J Healthc Inform Res. 2020 Dec;4(4):383-394. doi: 10.1007/s41666-020-00077-1. Epub 2020 Aug 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验