• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

设计一种用于数据挖掘中异常值检测的流算法-一种增量方法。

Designing a Streaming Algorithm for Outlier Detection in Data Mining-An Incrementa Approach.

机构信息

School of Computer Science, Carleton University, Ottawa, ON K1S 5B6, Canada.

School of Information Technology, Carleton University, Ottawa, ON K1S 5B6, Canada.

出版信息

Sensors (Basel). 2020 Feb 26;20(5):1261. doi: 10.3390/s20051261.

DOI:10.3390/s20051261
PMID:32110907
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7085525/
Abstract

To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses and to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing Unit (GPU). We also present another algorithm, C_LOF, based on a very popular and effective outlier detection algorithm called Local Outlier Factor (LOF) which unfortunately works only on batched data. Using a novel incremental approach that compensates the drawback of high complexity in LOF, we show how to implement it in a streaming context and to obtain results in a timely manner. Like C_KDE_WR, C_LOF also employs sliding-window and to help making decision based on the data in the current window. It also addresses all those challenges of streaming data as addressed in C_KDE_WR. In addition, we report the comparative evaluation on the accuracy of C_KDE_WR with the state-of-the-art SOD_GPU using Precision, Recall and F-score metrics. Furthermore, a t-test is also performed to demonstrate the significance of the improvement. We further report the testing results of C_LOF on different parameter settings and drew ROC and PR curve with their area under the curve (AUC) and Average Precision (AP) values calculated respectively. Experimental results show that C_LOF can overcome the problem, which often exists in outlier detection on streaming data. We provide complexity analysis and report experiment results on the accuracy of both C_KDE_WR and C_LOF algorithms in order to evaluate their effectiveness as well as their efficiencies.

摘要

设计用于检测流数据中异常值的算法已成为许多常见应用程序中的一项重要任务,这些应用程序出现在欺诈检测、网络分析、环境监测等领域。由于实时数据可能以流的形式而不是批处理的形式到达,因此需要考虑概念漂移、时间上下文、瞬态和不确定性等属性。此外,数据处理需要具有增量性和有限的内存资源,并且可扩展。这些事实给现有的异常值检测算法在以增量方式实现时的准确性方面带来了巨大挑战,尤其是在流环境中。为了解决这些问题,我们首先提出了 C_KDE_WR,它使用 和 在线处理流数据,并报告其结果,展示了在图形处理单元 (GPU) 上的 CUDA 框架中处理实时流数据的高吞吐量。我们还提出了另一种算法 C_LOF,它基于一种非常流行且有效的异常值检测算法,称为局部离群因子 (LOF),但不幸的是,它仅适用于批处理数据。我们使用一种新颖的增量方法来弥补 LOF 中高复杂度的缺点,展示了如何在流上下文中实现它,并及时获得结果。与 C_KDE_WR 一样,C_LOF 还采用滑动窗口和 来帮助根据当前窗口中的数据做出决策。它还解决了 C_KDE_WR 中解决的所有流数据挑战。此外,我们报告了使用精度、召回率和 F 分数指标对 C_KDE_WR 与最先进的 SOD_GPU 的准确性进行的比较评估。此外,还进行了 t 检验以证明改进的重要性。我们还报告了 C_LOF 在不同参数设置下的测试结果,并分别绘制了 ROC 和 PR 曲线及其计算的曲线下面积 (AUC) 和平均精度 (AP) 值。实验结果表明,C_LOF 可以克服流数据中异常值检测中经常存在的 问题。我们提供了复杂性分析,并报告了 C_KDE_WR 和 C_LOF 算法的准确性实验结果,以评估它们的有效性和效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/196f/7085525/41266b7131d3/sensors-20-01261-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/196f/7085525/28dc39754a51/sensors-20-01261-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/196f/7085525/6e300b6b8539/sensors-20-01261-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/196f/7085525/41266b7131d3/sensors-20-01261-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/196f/7085525/28dc39754a51/sensors-20-01261-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/196f/7085525/6e300b6b8539/sensors-20-01261-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/196f/7085525/41266b7131d3/sensors-20-01261-g003.jpg

相似文献

1
Designing a Streaming Algorithm for Outlier Detection in Data Mining-An Incrementa Approach.设计一种用于数据挖掘中异常值检测的流算法-一种增量方法。
Sensors (Basel). 2020 Feb 26;20(5):1261. doi: 10.3390/s20051261.
2
TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams.TADILOF:数据数据流中的基于密度的时间感知增量式异常检测。
Sensors (Basel). 2020 Oct 15;20(20):5829. doi: 10.3390/s20205829.
3
Fast Outlier Detection Using a Grid-Based Algorithm.使用基于网格的算法进行快速离群值检测。
PLoS One. 2016 Nov 10;11(11):e0165972. doi: 10.1371/journal.pone.0165972. eCollection 2016.
4
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机:一种新概念漂移感知算法的实证评估
Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.
5
Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model.水质模型的数据驱动演变:创新异常值检测方法的深入研究——以爱尔兰水质指数(IEWQI)模型为例
Water Res. 2024 May 15;255:121499. doi: 10.1016/j.watres.2024.121499. Epub 2024 Mar 20.
6
Fair Max-Min Diversity Maximization in Streaming and Sliding-Window Models.流模型和滑动窗口模型中的公平最大最小多样性最大化
Entropy (Basel). 2023 Jul 14;25(7):1066. doi: 10.3390/e25071066.
7
STAR_outliers: a python package that separates univariate outliers from non-normal distributions.STAR异常值:一个用于从非正态分布中分离单变量异常值的Python包。
BioData Min. 2023 Sep 4;16(1):25. doi: 10.1186/s13040-023-00342-0.
8
Interactive collision detection for deformable models using streaming AABBs.使用流式轴对齐包围盒(AABB)对可变形模型进行交互式碰撞检测。
IEEE Trans Vis Comput Graph. 2007 Mar-Apr;13(2):318-29. doi: 10.1109/TVCG.2007.42.
9
Entropy-based grid approach for handling outliers: a case study to environmental monitoring data.基于熵的网格方法处理异常值:以环境监测数据为例。
Environ Sci Pollut Res Int. 2023 Dec;30(60):125138-125157. doi: 10.1007/s11356-023-26780-1. Epub 2023 Jun 12.
10
A Participation Degree-Based Fault Detection Method for Wireless Sensor Networks.基于参与度的无线传感器网络故障检测方法。
Sensors (Basel). 2019 Mar 28;19(7):1522. doi: 10.3390/s19071522.

本文引用的文献

1
A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM.一种用于不平衡数据学习的新型集成方法:外推-SMOTE支持向量机的装袋法
Comput Intell Neurosci. 2017;2017:1827016. doi: 10.1155/2017/1827016. Epub 2017 Jan 30.