College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.
Sensors (Basel). 2021 Sep 13;21(18):6125. doi: 10.3390/s21186125.
Enterprise systems typically produce a large number of logs to record runtime states and important events. Log anomaly detection is efficient for business management and system maintenance. Most existing log-based anomaly detection methods use log parser to get log event indexes or event templates and then utilize machine learning methods to detect anomalies. However, these methods cannot handle unknown log types and do not take advantage of the log semantic information. In this article, we propose ConAnomaly, a log-based anomaly detection model composed of a log sequence encoder (log2vec) and multi-layer Long Short Term Memory Network (LSTM). We designed log2vec based on the Word2vec model, which first vectorized the words in the log content, then deleted the invalid words through part of speech tagging, and finally obtained the sequence vector by the weighted average method. In this way, ConAnomaly not only captures semantic information in the log but also leverages log sequential relationships. We evaluate our proposed approach on two log datasets. Our experimental results show that ConAnomaly has good stability and can deal with unseen log types to a certain extent, and it provides better performance than most log-based anomaly detection methods.
企业系统通常会生成大量的日志来记录运行时状态和重要事件。日志异常检测对于业务管理和系统维护非常有效。大多数现有的基于日志的异常检测方法使用日志解析器来获取日志事件索引或事件模板,然后利用机器学习方法来检测异常。然而,这些方法无法处理未知的日志类型,也无法利用日志的语义信息。在本文中,我们提出了 ConAnomaly,这是一个基于日志的异常检测模型,由日志序列编码器(log2vec)和多层长短时记忆网络(LSTM)组成。我们基于 Word2vec 模型设计了 log2vec,它首先对日志内容中的单词进行向量化,然后通过词性标注删除无效单词,最后通过加权平均方法获得序列向量。这样,ConAnomaly 不仅可以捕获日志中的语义信息,还可以利用日志的顺序关系。我们在两个日志数据集上评估了我们提出的方法。实验结果表明,ConAnomaly 具有良好的稳定性,可以在一定程度上处理未知的日志类型,并且比大多数基于日志的异常检测方法具有更好的性能。