Department of Industrial and Management Systems Engineering, West Virginia University, 401 Evansdale Dr, Morgantown, WV, 26505, USA.
Department of Systems and Operations Management, California State University Northridge, 18111 Nordhoff St, Northridge, CA, 91330, USA.
Health Care Manag Sci. 2022 Sep;25(3):484-497. doi: 10.1007/s10729-022-09597-1. Epub 2022 Jun 23.
The availability of data in the healthcare domain provides great opportunities for the discovery of new or hidden patterns in medical data, which can eventually lead to improved clinical decision making. Predictive models play a crucial role in extracting this unknown information from data. However, medical data often contain missing values that can degrade the performance of predictive models. Autoencoder models have been widely used as non-linear functions for the imputation of missing data in fields such as computer vision, transportation, and finance. In this study, we assess the shortcomings of autoencoder models for data imputation and propose modified models to improve imputation performance. To evaluate, we compare the performance of the proposed model with five well-known imputation techniques on six medical datasets and five classification methods. Through extensive experiments, we demonstrate that the proposed non-linear imputation model outperforms the other models for all degrees of missing ratios and leads to the highest disease classification accuracy for all datasets.
医疗领域数据的可用性为发现医学数据中的新或隐藏模式提供了巨大的机会,这最终可以导致改善临床决策。预测模型在从数据中提取这些未知信息方面起着至关重要的作用。然而,医疗数据通常包含缺失值,这会降低预测模型的性能。自动编码器模型已被广泛用作计算机视觉、交通和金融等领域缺失数据插补的非线性函数。在这项研究中,我们评估了自动编码器模型在数据插补方面的缺点,并提出了改进模型以提高插补性能。为了进行评估,我们将所提出的模型的性能与五种著名的插补技术在六个医疗数据集和五种分类方法上进行了比较。通过广泛的实验,我们证明所提出的非线性插补模型在所有缺失率下都优于其他模型,并为所有数据集带来了最高的疾病分类准确性。