Suppr超能文献

自动填充:一种用于大型临床数据集上自动编码器插补的新框架。

Autopopulus: A Novel Framework for Autoencoder Imputation on Large Clinical Datasets.

作者信息

Zamanzadeh Davina J, Petousis Panayiotis, Davis Tyler A, Nicholas Susanne B, Norris Keith C, Tuttle Katherine R, Bui Alex A T, Sarrafzadeh Majid

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2303-2309. doi: 10.1109/EMBC46164.2021.9630135.

Abstract

The adoption of electronic health records (EHRs) has made patient data increasingly accessible, precipitating the development of various clinical decision support systems and data-driven models to help physicians. However, missing data are common in EHR-derived datasets, which can introduce significant uncertainty, if not invalidating the use of a predictive model. Machine learning (ML)-based imputation methods have shown promise in various domains for the task of estimating values and reducing uncertainty to the point that a predictive model can be employed. We introduce Autopopulus, a novel framework that enables the design and evaluation of various autoencoder architectures for efficient imputation on large datasets. Autopopulus implements existing autoencoder methods as well as a new technique that outputs a range of estimated values (rather than point estimates), and demonstrates a workflow that helps users make an informed decision on an appropriate imputation method. To further illustrate Autopopulus' utility, we use it to identify not only which imputation methods can most accurately impute on a large clinical dataset, but to also identify the imputation methods that enable downstream predictive models to achieve the best performance for prediction of chronic kidney disease (CKD) progression.

摘要

电子健康记录(EHRs)的采用使患者数据越来越容易获取,促使各种临床决策支持系统和数据驱动模型得以发展,以帮助医生。然而,在源自电子健康记录的数据集中,缺失数据很常见,这可能会带来重大不确定性,甚至使预测模型无法使用。基于机器学习(ML)的插补方法在各个领域都显示出有望用于估计值任务并将不确定性降低到可以使用预测模型的程度。我们介绍了Autopopulus,这是一个新颖的框架,能够设计和评估各种自动编码器架构,以便在大型数据集上进行高效插补。Autopopulus实现了现有的自动编码器方法以及一种输出一系列估计值(而非点估计)的新技术,并展示了一种帮助用户就合适的插补方法做出明智决策的工作流程。为了进一步说明Autopopulus的效用,我们不仅用它来确定哪些插补方法能够在大型临床数据集上最准确地进行插补,还确定哪些插补方法能使下游预测模型在预测慢性肾脏病(CKD)进展方面取得最佳性能。

相似文献

本文引用的文献

7
Discretization of continuous features in clinical datasets.临床数据集连续特征的离散化。
J Am Med Inform Assoc. 2013 May 1;20(3):544-53. doi: 10.1136/amiajnl-2012-000929. Epub 2012 Oct 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验