Su Xinyu, Wang Xiwen, Peng Dezhong, Song Xiaomin, Zheng Huiming, Yuan Zhong
IEEE Trans Neural Netw Learn Syst. 2025 Oct;36(10):18956-18967. doi: 10.1109/TNNLS.2025.3578074.
Existing density-based outlier detection methods process data at the single-granularity level of individual samples, requiring pairwise distance calculations between all samples and exhibiting high sensitivity to noise. The single-granularity-based processing paradigm fails to mine the information at multiple levels of granularity in data, and most of these methods ignore the potential uncertainty information in data, such as fuzziness, resulting in an inability to effectively detect potential outliers in data. As a novel granular computing method, Granular-Ball Computing (GBC) is characterized by its multi-granularity and robustness, which makes it able to make up for the above drawbacks well. In this study, we propose local Granular-Ball Density-based Outlier (GBDO) detection to improve the performance of the density-based methods. In GBDO, we first identify the $k\text {-}$ similarity Granular-Ball (GB) neighborhoods of each GB via the fuzzy relations among them. Subsequently, the local reachability similarity density of the GBs is calculated through the reachability similarity we defined. Finally, the local GB outlier factors of the samples are calculated based on the local reachability similarity density of the GBs. We adopt a multi-granularity processing paradigm using GBs as the basic units, which reduces computational complexity and improves robustness to noisy data by leveraging the multi-granularity nature of GBs. The experimental results demonstrate the effectiveness of GBDO by comparing it with state-of-the-art methods. The source code and datasets are publicly available at https://github.com/Mxeron/GBDO.
现有的基于密度的离群点检测方法在单个样本的单粒度级别上处理数据,需要计算所有样本之间的成对距离,并且对噪声表现出高敏感性。基于单粒度的处理范式无法挖掘数据中多粒度级别的信息,并且这些方法中的大多数都忽略了数据中的潜在不确定性信息,例如模糊性,导致无法有效地检测数据中的潜在离群点。作为一种新颖的粒度计算方法,粒度球计算(GBC)具有多粒度和鲁棒性的特点,这使其能够很好地弥补上述缺点。在本研究中,我们提出了基于局部粒度球密度的离群点(GBDO)检测方法来提高基于密度方法的性能。在GBDO中,我们首先通过每个粒度球(GB)之间的模糊关系来识别每个GB的$k$相似粒度球邻域。随后,通过我们定义的可达性相似度来计算GB的局部可达性相似密度。最后,基于GB的局部可达性相似密度来计算样本的局部GB离群点因子。我们采用以GB为基本单元的多粒度处理范式,通过利用GB的多粒度特性降低了计算复杂度并提高了对噪声数据的鲁棒性。实验结果通过将GBDO与现有最先进的方法进行比较,证明了其有效性。源代码和数据集可在https://github.com/Mxeron/GBDO上公开获取。