School of Information Technology and Engineering, VIT University, India.
School of Information Technology and Engineering, VIT University, India.
J Biomed Inform. 2019 Jun;94:103190. doi: 10.1016/j.jbi.2019.103190. Epub 2019 May 2.
Electronic health records (EHR) are a major source of information in biomedical informatics. Yet, missing values are prominent characteristics of EHR. Prediction on dataset with missing values results in inaccurate inferences. Nearest neighbour imputation based on lazy learning approach is a proven technique for missing data imputation and is recognized as one among the top ten data mining algorithms due to its simplicity and understandability. But its performance is deteriorated due to the curse of dimensionality as unimportant features are likely to dominate. We address this problem by proposing a novel approach for feature weighting based on a hybrid of metaheuristic whale optimization algorithm (WOA) and local search late acceptance hill climbing algorithm (LAHCA) on nearest neighbour imputation method. Our proposed approach Metaheuristic and Local Search based Feature Weighted Nearest Neighbour Imputation (kNN+LAHCAWOA) also learns different k values for different test points. Our approach is tested on benchmark EHR datasets with three proven classifiers Support Vector Machines(SVM), Random forest(RF) and Deep neural networks(DNN). The results prove that kNN+LAHCAWOA is an effective imputation strategy and aids in improving the classification performance when compared with its competitor methods.
电子健康记录 (EHR) 是生物医学信息学中的主要信息来源。然而,缺失值是 EHR 的突出特征。在具有缺失值的数据集上进行预测会导致不准确的推断。基于懒惰学习方法的最近邻插补是一种经过验证的缺失数据插补技术,由于其简单性和可理解性,被公认为十大数据挖掘算法之一。但是,由于维度的诅咒,不重要的特征可能会占据主导地位,其性能会恶化。我们通过提出一种基于混合元启发式鲸鱼优化算法 (WOA) 和最近邻插补方法的局部搜索后期接受爬山算法 (LAHCA) 的新特征加权方法来解决这个问题。我们提出的方法元启发式和基于局部搜索的特征加权最近邻插补 (kNN+LAHCAWOA) 还为不同的测试点学习不同的 k 值。我们的方法在基准 EHR 数据集上使用三种经过验证的分类器(支持向量机 (SVM)、随机森林 (RF) 和深度神经网络 (DNN))进行了测试。结果证明,与竞争对手的方法相比,kNN+LAHCAWOA 是一种有效的插补策略,可以帮助提高分类性能。