Liu Yansong, Zhu Li, Ding Lei, Huang Zifeng, Sui He, Wang Shuang, Song Yuedong
School of Software Engineering, Xi'an Jiao Tong University, Xi'an, China.
School of Intelligent Engineering, Shandong Management University, Jinan, China.
Sci Rep. 2024 Jan 16;14(1):1420. doi: 10.1038/s41598-024-51849-3.
Anomaly detection is a highly important task in the field of data analysis. Traditional anomaly detection approaches often strongly depend on data size, structure and features, while introducing the idea of ensemble into anomaly detection can greatly improve the generalization ability. Ensemble-based anomaly detection methods still face some challenges, however, such as data imbalance, time and space demand and the selection of base detectors. To this end, we propose a selective ensemble method for anomaly detection based on parallel learning (SEAD-PL). First, a differentiated stratified sampling method is designed to alleviate the problem of data imbalance. Then, a distributed parallel training frame is built to address the problem of excessive time and space consumption for base detector training. Finally, a clustering-based ensemble selection strategy is introduced to balance the accuracy and diversity of base detectors. Experiments are performed on six datasets, which demonstrate that the proposed method has obvious advantages over four selected methods.
异常检测是数据分析领域中一项非常重要的任务。传统的异常检测方法通常强烈依赖于数据大小、结构和特征,而将集成思想引入异常检测可以大大提高泛化能力。然而,基于集成的异常检测方法仍然面临一些挑战,如数据不平衡、时间和空间需求以及基检测器的选择。为此,我们提出了一种基于并行学习的异常检测选择性集成方法(SEAD-PL)。首先,设计了一种差异化分层采样方法来缓解数据不平衡问题。然后,构建了一个分布式并行训练框架来解决基检测器训练时时间和空间消耗过多的问题。最后,引入了一种基于聚类的集成选择策略来平衡基检测器的准确性和多样性。在六个数据集上进行了实验,结果表明所提出的方法相对于四种选定的方法具有明显优势。