Suppr超能文献

处理医疗保健数据中的缺失值:基于深度学习的插补技术的系统评价。

Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques.

机构信息

Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore.

Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore.

出版信息

Artif Intell Med. 2023 Aug;142:102587. doi: 10.1016/j.artmed.2023.102587. Epub 2023 May 22.

Abstract

OBJECTIVE

The proper handling of missing values is critical to delivering reliable estimates and decisions, especially in high-stakes fields such as clinical research. In response to the increasing diversity and complexity of data, many researchers have developed deep learning (DL)-based imputation techniques. We conducted a systematic review to evaluate the use of these techniques, with a particular focus on the types of data, intending to assist healthcare researchers from various disciplines in dealing with missing data.

MATERIALS AND METHODS

We searched five databases (MEDLINE, Web of Science, Embase, CINAHL, and Scopus) for articles published prior to February 8, 2023 that described the use of DL-based models for imputation. We examined selected articles from four perspectives: data types, model backbones (i.e., main architectures), imputation strategies, and comparisons with non-DL-based methods. Based on data types, we created an evidence map to illustrate the adoption of DL models.

RESULTS

Out of 1822 articles, a total of 111 were included, of which tabular static data (29%, 32/111) and temporal data (40%, 44/111) were the most frequently investigated. Our findings revealed a discernible pattern in the choice of model backbones and data types, for example, the dominance of autoencoder and recurrent neural networks for tabular temporal data. The discrepancy in imputation strategy usage among data types was also observed. The "integrated" imputation strategy, which solves the imputation task simultaneously with downstream tasks, was most popular for tabular temporal data (52%, 23/44) and multi-modal data (56%, 5/9). Moreover, DL-based imputation methods yielded a higher level of imputation accuracy than non-DL methods in most studies.

CONCLUSION

The DL-based imputation models are a family of techniques, with diverse network structures. Their designation in healthcare is usually tailored to data types with different characteristics. Although DL-based imputation models may not be superior to conventional approaches across all datasets, it is highly possible for them to achieve satisfactory results for a particular data type or dataset. There are, however, still issues with regard to portability, interpretability, and fairness associated with current DL-based imputation models.

摘要

目的

在高风险领域,如临床研究中,正确处理缺失值对于提供可靠的估计和决策至关重要。为了应对数据的日益多样化和复杂化,许多研究人员已经开发了基于深度学习(DL)的插补技术。我们进行了一项系统评价,以评估这些技术的使用情况,特别是关注数据类型,旨在帮助来自不同学科的医疗保健研究人员处理缺失数据。

材料和方法

我们在五个数据库(MEDLINE、Web of Science、Embase、CINAHL 和 Scopus)中搜索了截至 2023 年 2 月 8 日之前发表的描述基于 DL 的模型用于插补的文章。我们从四个方面检查了选定的文章:数据类型、模型骨干(即主要架构)、插补策略以及与非基于 DL 的方法的比较。根据数据类型,我们创建了一个证据图来说明 DL 模型的采用情况。

结果

在 1822 篇文章中,共有 111 篇被纳入,其中表格静态数据(29%,32/111)和时间数据(40%,44/111)是最常研究的。我们的研究结果表明,模型骨干和数据类型的选择存在明显的模式,例如,自动编码器和递归神经网络在表格时间数据中占主导地位。在数据类型之间,插补策略的使用也存在差异。“集成”插补策略,即同时解决插补任务和下游任务,在表格时间数据(52%,23/44)和多模态数据(56%,5/9)中最为流行。此外,在大多数研究中,基于 DL 的插补方法的插补准确性都高于非基于 DL 的方法。

结论

基于 DL 的插补模型是一组技术,具有不同的网络结构。它们在医疗保健中的指定通常针对具有不同特征的数据类型。虽然基于 DL 的插补模型在所有数据集上不一定都优于传统方法,但它们很有可能针对特定数据类型或数据集取得令人满意的结果。然而,目前基于 DL 的插补模型仍然存在可移植性、可解释性和公平性方面的问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验