处理医疗保健数据中的缺失值：基于深度学习的插补技术的系统评价。

Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques.

机构信息

Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore.

Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore.

出版信息

Artif Intell Med. 2023 Aug;142:102587. doi: 10.1016/j.artmed.2023.102587. Epub 2023 May 22.

DOI:10.1016/j.artmed.2023.102587

PMID:37316097

Abstract

OBJECTIVE

The proper handling of missing values is critical to delivering reliable estimates and decisions, especially in high-stakes fields such as clinical research. In response to the increasing diversity and complexity of data, many researchers have developed deep learning (DL)-based imputation techniques. We conducted a systematic review to evaluate the use of these techniques, with a particular focus on the types of data, intending to assist healthcare researchers from various disciplines in dealing with missing data.

MATERIALS AND METHODS

We searched five databases (MEDLINE, Web of Science, Embase, CINAHL, and Scopus) for articles published prior to February 8, 2023 that described the use of DL-based models for imputation. We examined selected articles from four perspectives: data types, model backbones (i.e., main architectures), imputation strategies, and comparisons with non-DL-based methods. Based on data types, we created an evidence map to illustrate the adoption of DL models.

RESULTS

Out of 1822 articles, a total of 111 were included, of which tabular static data (29%, 32/111) and temporal data (40%, 44/111) were the most frequently investigated. Our findings revealed a discernible pattern in the choice of model backbones and data types, for example, the dominance of autoencoder and recurrent neural networks for tabular temporal data. The discrepancy in imputation strategy usage among data types was also observed. The "integrated" imputation strategy, which solves the imputation task simultaneously with downstream tasks, was most popular for tabular temporal data (52%, 23/44) and multi-modal data (56%, 5/9). Moreover, DL-based imputation methods yielded a higher level of imputation accuracy than non-DL methods in most studies.

CONCLUSION

The DL-based imputation models are a family of techniques, with diverse network structures. Their designation in healthcare is usually tailored to data types with different characteristics. Although DL-based imputation models may not be superior to conventional approaches across all datasets, it is highly possible for them to achieve satisfactory results for a particular data type or dataset. There are, however, still issues with regard to portability, interpretability, and fairness associated with current DL-based imputation models.

摘要

目的

在高风险领域，如临床研究中，正确处理缺失值对于提供可靠的估计和决策至关重要。为了应对数据的日益多样化和复杂化，许多研究人员已经开发了基于深度学习（DL）的插补技术。我们进行了一项系统评价，以评估这些技术的使用情况，特别是关注数据类型，旨在帮助来自不同学科的医疗保健研究人员处理缺失数据。

材料和方法

我们在五个数据库（MEDLINE、Web of Science、Embase、CINAHL 和 Scopus）中搜索了截至 2023 年 2 月 8 日之前发表的描述基于 DL 的模型用于插补的文章。我们从四个方面检查了选定的文章：数据类型、模型骨干（即主要架构）、插补策略以及与非基于 DL 的方法的比较。根据数据类型，我们创建了一个证据图来说明 DL 模型的采用情况。

结果

在 1822 篇文章中，共有 111 篇被纳入，其中表格静态数据（29%，32/111）和时间数据（40%，44/111）是最常研究的。我们的研究结果表明，模型骨干和数据类型的选择存在明显的模式，例如，自动编码器和递归神经网络在表格时间数据中占主导地位。在数据类型之间，插补策略的使用也存在差异。“集成”插补策略，即同时解决插补任务和下游任务，在表格时间数据（52%，23/44）和多模态数据（56%，5/9）中最为流行。此外，在大多数研究中，基于 DL 的插补方法的插补准确性都高于非基于 DL 的方法。

结论

基于 DL 的插补模型是一组技术，具有不同的网络结构。它们在医疗保健中的指定通常针对具有不同特征的数据类型。虽然基于 DL 的插补模型在所有数据集上不一定都优于传统方法，但它们很有可能针对特定数据类型或数据集取得令人满意的结果。然而，目前基于 DL 的插补模型仍然存在可移植性、可解释性和公平性方面的问题。

相似文献

Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques.

Artif Intell Med. 2023 Aug;142:102587. doi: 10.1016/j.artmed.2023.102587. Epub 2023 May 22.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Systemic treatments for metastatic cutaneous melanoma.

Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

The measurement of collaboration within healthcare settings: a systematic review of measurement properties of instruments.

JBI Database System Rev Implement Rep. 2016 Apr;14(4):138-97. doi: 10.11124/JBISRIR-2016-2159.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Measures implemented in the school setting to contain the COVID-19 pandemic.

Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

引用本文的文献

Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.

Proc Mach Learn Res. 2025 Jun;287:480-501.

Enhancing glucose level prediction of ICU patients through hierarchical modeling of irregular time-series.

Comput Struct Biotechnol J. 2025 Jul 1;27:2898-2914. doi: 10.1016/j.csbj.2025.06.039. eCollection 2025.

Non-linear relationship between platelet count and 30-day in-hospital mortality in ICU patients with acute myocardial infarction: a multicenter retrospective cohort study.

Sci Rep. 2025 Jul 1;15(1):21821. doi: 10.1038/s41598-025-06317-x.

Cerebral Autoregulation and Optimal Blood Pressure from Birth to Surgery in Neonates with Critical Congenital Heart Disease.

Pediatr Cardiol. 2025 Jun 19. doi: 10.1007/s00246-025-03921-6.

Emerging artificial intelligence-driven precision therapies in tumor drug resistance: recent advances, opportunities, and challenges.

Mol Cancer. 2025 Apr 23;24(1):123. doi: 10.1186/s12943-025-02321-x.

miss-SNF: a multimodal patient similarity network integration approach to handle completely missing data sources.

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf150.

Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.

BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.

Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods.

Sensors (Basel). 2025 Jan 21;25(3):614. doi: 10.3390/s25030614.

The Potential of Metabolomics in Colorectal Cancer Prognosis.

Metabolites. 2024 Dec 15;14(12):708. doi: 10.3390/metabo14120708.

Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.

Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

处理医疗保健数据中的缺失值：基于深度学习的插补技术的系统评价。

Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料和方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献