• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

日志事件 2 向量:基于日志事件到向量的物联网大规模日志异常检测。

LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things.

机构信息

School of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China.

Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, Nanjing 210003, China.

出版信息

Sensors (Basel). 2020 Apr 26;20(9):2451. doi: 10.3390/s20092451.

DOI:10.3390/s20092451
PMID:32357404
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7249657/
Abstract

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.

摘要

日志异常检测是管理现代大规模物联网 (IoT) 系统的有效方法。越来越多的工作开始将自然语言处理 (NLP) 方法,特别是 word2vec,应用于日志特征提取中。Word2vec 可以提取单词之间的相关性并将单词向量化。然而,训练 word2vec 的计算成本很高。日志中的异常不仅取决于单个日志消息,还取决于日志消息序列。因此,word2vec 生成的单词向量不能直接使用,需要转换为日志事件向量,然后进一步转换为日志序列向量。为了降低计算成本并避免多次转换,本文提出了一种离线特征提取模型,称为 LogEvent2vec,它将日志事件作为 word2vec 的输入,直接提取日志事件之间的相关性并对日志事件进行向量化。LogEvent2vec 可以与任何坐标变换方法和异常检测模型配合使用。获取日志事件向量后,我们通过重心或 tf-idf 将日志事件向量转换为日志序列向量,并使用三种监督模型(随机森林、朴素贝叶斯和神经网络)进行训练以检测异常。我们在 BlueGene/L (BGL) 的真实公共日志数据集上进行了广泛的实验。实验结果表明,与 word2vec 相比,LogEvent2vec 可以显著减少 30 倍的计算时间并提高准确性。使用重心和随机森林的 LogEvent2vec 可以获得最佳的 F1 分数,而使用 tf-idf 和朴素贝叶斯的 LogEvent2vec 需要的计算时间最少。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/f9140ce487b9/sensors-20-02451-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/9818ee21f980/sensors-20-02451-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/423b7741af7e/sensors-20-02451-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/e12c7ab87b2b/sensors-20-02451-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/1188023e6fa5/sensors-20-02451-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/7f452ef44ac7/sensors-20-02451-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/f9140ce487b9/sensors-20-02451-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/9818ee21f980/sensors-20-02451-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/423b7741af7e/sensors-20-02451-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/e12c7ab87b2b/sensors-20-02451-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/1188023e6fa5/sensors-20-02451-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/7f452ef44ac7/sensors-20-02451-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/7249657/f9140ce487b9/sensors-20-02451-g006.jpg

相似文献

1
LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things.日志事件 2 向量:基于日志事件到向量的物联网大规模日志异常检测。
Sensors (Basel). 2020 Apr 26;20(9):2451. doi: 10.3390/s20092451.
2
ConAnomaly: Content-Based Anomaly Detection for System Logs.ConAnomaly:基于内容的系统日志异常检测。
Sensors (Basel). 2021 Sep 13;21(18):6125. doi: 10.3390/s21186125.
3
Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data.通过TF-IDF和Word2vec文本分析研究反应行为:以2012年国际学生评估项目(PISA)解决问题过程数据为例
Heliyon. 2024 Aug 10;10(16):e35945. doi: 10.1016/j.heliyon.2024.e35945. eCollection 2024 Aug 30.
4
CLDTLog: System Log Anomaly Detection Method Based on Contrastive Learning and Dual Objective Tasks.CLDTLog:基于对比学习和双重目标任务的系统日志异常检测方法。
Sensors (Basel). 2023 May 24;23(11):5042. doi: 10.3390/s23115042.
5
Identification of offensive language in Urdu using semantic and embedding models.使用语义和嵌入模型识别乌尔都语中的冒犯性语言。
PeerJ Comput Sci. 2022 Dec 12;8:e1169. doi: 10.7717/peerj-cs.1169. eCollection 2022.
6
A Method of Short Text Representation Based on the Feature Probability Embedded Vector.一种基于特征概率嵌入向量的短文本表示方法。
Sensors (Basel). 2019 Aug 28;19(17):3728. doi: 10.3390/s19173728.
7
Optimizing Word Embeddings for Patient Portal Message Datasets with a Small Number of Samples.针对少量样本的患者门户消息数据集优化词嵌入
Res Sq. 2024 May 15:rs.3.rs-4350387. doi: 10.21203/rs.3.rs-4350387/v1.
8
IoTDS: A One-Class Classification Approach to Detect Botnets in Internet of Things Devices.IoTDS:一种用于检测物联网设备中僵尸网络的单类分类方法。
Sensors (Basel). 2019 Jul 19;19(14):3188. doi: 10.3390/s19143188.
9
Question classification based on Bloom's taxonomy cognitive domain using modified TF-IDF and word2vec.基于布鲁姆认知领域分类法的改进 TF-IDF 与词向量的问题分类。
PLoS One. 2020 Mar 19;15(3):e0230442. doi: 10.1371/journal.pone.0230442. eCollection 2020.
10
Anomaly traffic detection based on feature fluctuation for secure industrial internet of things.基于特征波动的异常流量检测用于安全的工业物联网
Peer Peer Netw Appl. 2023 Apr 26:1-16. doi: 10.1007/s12083-023-01482-0.

引用本文的文献

1
Preparing Distributed Computing Operations for the HL-LHC Era With Operational Intelligence.利用运营智能为高亮度大型强子对撞机(HL-LHC)时代准备分布式计算操作。
Front Big Data. 2022 Jan 7;4:753409. doi: 10.3389/fdata.2021.753409. eCollection 2021.
2
Internet of Things for Smart Community Solutions.物联网在智能社区解决方案中的应用。
Sensors (Basel). 2022 Jan 14;22(2):640. doi: 10.3390/s22020640.
3
Log Sequence Anomaly Detection Method Based on Contrastive Adversarial Training and Dual Feature Extraction.基于对比对抗训练和双特征提取的日志序列异常检测方法

本文引用的文献

1
Interference-Aware Routing for Difficult Wireless Sensor Network Environment with SWIPT.基于 SWIPT 的具有干扰感知的困难无线传感器网络路由
Sensors (Basel). 2019 Sep 14;19(18):3978. doi: 10.3390/s19183978.
2
A Method of Short Text Representation Based on the Feature Probability Embedded Vector.一种基于特征概率嵌入向量的短文本表示方法。
Sensors (Basel). 2019 Aug 28;19(17):3728. doi: 10.3390/s19173728.
3
An Affinity Propagation-Based Self-Adaptive Clustering Method for Wireless Sensor Networks.一种基于亲和传播的无线传感器网络自适应聚类方法。
Entropy (Basel). 2021 Dec 30;24(1):69. doi: 10.3390/e24010069.
4
Correlating Time Series Signals and Event Logs in Embedded Systems.在嵌入式系统中关联时间序列信号和事件日志。
Sensors (Basel). 2021 Oct 27;21(21):7128. doi: 10.3390/s21217128.
5
Data-Driven Anomaly Detection Approach for Time-Series Streaming Data.用于时间序列流数据的数据驱动异常检测方法
Sensors (Basel). 2020 Oct 2;20(19):5646. doi: 10.3390/s20195646.
6
A Novel Dynamic Three-Level Tracking Controller for Mobile Robots Considering Actuators and Power Stage Subsystems: Experimental Assessment.一种考虑执行器和功率级子系统的移动机器人新型动态三电平跟踪控制器:实验评估
Sensors (Basel). 2020 Sep 2;20(17):4959. doi: 10.3390/s20174959.
Sensors (Basel). 2019 Jun 6;19(11):2579. doi: 10.3390/s19112579.
4
Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation.利用 Twitter 数据监测自然灾害社会动态:基于词嵌入和核密度估计的递归神经网络方法。
Sensors (Basel). 2019 Apr 11;19(7):1746. doi: 10.3390/s19071746.
5
Sensing Urban Transportation Events from Multi-Channel Social Signals with the Word2vec Fusion Model.基于词向量融合模型的多通道社会信号感知城市交通事件。
Sensors (Basel). 2018 Nov 22;18(12):4093. doi: 10.3390/s18124093.
6
Smart Contract-Based Review System for an IoT Data Marketplace.基于智能合约的物联网数据市场审查系统。
Sensors (Basel). 2018 Oct 22;18(10):3577. doi: 10.3390/s18103577.