Khan Yasir, Shah Said Farooq, Asim Syed Muhammad
Government College of Management Sciences Jamrud, Jamrud, KP, Pakistan.
Department of Statistics, University of Peshawar, Peshawar, KP, Pakistan.
J Appl Stat. 2024 Oct 11;52(5):1103-1127. doi: 10.1080/02664763.2024.2414357. eCollection 2025.
Missing data is a common problem in many domains that rely on data analysis. The Nearest Neighbors imputation method has been widely used to address this issue, but it has limitations in accurately imputing missing values, especially for datasets with small pairwise correlations and small values of . In this study, we proposed a method, Ranked Nearest Neighbors imputation that uses a similar approach to Nearest Neighbor, but utilizing the concept of Ranked set sampling to select the most relevant neighbors for imputation. Our results show that the proposed method outperforms the standard nearest neighbor method in terms of imputation accuracy both in case of Missing Completely at Random and Missing at Random mechanism, as demonstrated by consistently lower MSIE and MAIE values across all datasets. This suggests that the proposed method is a promising alternative for imputing missing values in datasets with small pairwise correlations and small values of . Thus, the proposed Ranked Nearest Neighbor method has important implications for data imputation in various domains and can contribute to the development of more efficient and accurate imputation methods without adding any computational complexity to an algorithm.
缺失数据是许多依赖数据分析的领域中常见的问题。最近邻插补方法已被广泛用于解决这一问题,但它在准确插补缺失值方面存在局限性,特别是对于成对相关性较小且[此处原文缺失具体内容]值较小的数据集。在本研究中,我们提出了一种排序最近邻插补方法,该方法采用与最近邻类似的方法,但利用排序集抽样的概念来选择最相关的邻居进行插补。我们的结果表明,在完全随机缺失和随机缺失机制的情况下,所提出的方法在插补准确性方面优于标准最近邻方法,这体现在所有数据集上的MSIE和MAIE值始终较低。这表明所提出的方法是插补成对相关性较小且[此处原文缺失具体内容]值较小的数据集缺失值的一种有前途的替代方法。因此,所提出的排序最近邻方法对各领域的数据插补具有重要意义,并且可以在不增加算法计算复杂度的情况下,为开发更高效、准确的插补方法做出贡献。