Zhang Hao, Zhou Yun, Xu Huahu, Shi Jiangang, Lin Xinhua, Gao Yiqin
School of Computer Engineering and Science, Shanghai University, Shanghai, China.
Shanghai KingLong IoT Co., Ltd., Shanghai, China.
PLoS One. 2025 Jan 7;20(1):e0315897. doi: 10.1371/journal.pone.0315897. eCollection 2025.
Virtual machine logs are generated in large quantities. Virtual machine logs may contain some abnormal logs that indicate security risks or system failures of the virtual machine platform. Therefore, using unsupervised anomaly detection methods to identify abnormal logs is a meaningful task. However, collecting accurate anomaly logs in the real world is often challenging, and there is inherent noise in the log information. Parsing logs and anomaly alerts can be time-consuming, making it important to improve their effectiveness and accuracy. To address these challenges, this paper proposes a method called LADSVM(Long Short-Term Memory + Autoencoder-Decoder + SVM). Firstly, the log parsing algorithm is used to parse the logs. Then, the feature extraction algorithm, which combines Long Short-Term Memory and Autoencoder-Decoder, is applied to extract features. Autoencoder-Decoder reduces the dimensionality of the data by mapping the high-dimensional input to a low-dimensional latent space. This helps eliminate redundant information and noise, extract key features, and increase robustness. Finally, the Support Vector Machine is utilized to detect different feature vector signals. Experimental results demonstrate that compared to traditional methods, this approach is capable of learning better features without any prior knowledge, while also exhibiting superior noise robustness and performance. The LADSVM approach excels at detecting anomalies in virtual machine logs characterized by strong sequential patterns and noise. However, its performance may vary when applied to disordered log data. This highlights the necessity of carefully selecting detection methods that align with the specific characteristics of different log data types.
虚拟机日志大量生成。虚拟机日志可能包含一些表明虚拟机平台存在安全风险或系统故障的异常日志。因此,使用无监督异常检测方法来识别异常日志是一项有意义的任务。然而,在现实世界中收集准确的异常日志往往具有挑战性,并且日志信息中存在固有噪声。解析日志和异常警报可能很耗时,因此提高它们的有效性和准确性很重要。为了应对这些挑战,本文提出了一种名为LADSVM(长短期记忆网络+自动编码器-解码器+支持向量机)的方法。首先,使用日志解析算法来解析日志。然后,应用结合了长短期记忆网络和自动编码器-解码器的特征提取算法来提取特征。自动编码器-解码器通过将高维输入映射到低维潜在空间来降低数据维度。这有助于消除冗余信息和噪声,提取关键特征,并提高鲁棒性。最后,利用支持向量机来检测不同的特征向量信号。实验结果表明,与传统方法相比,该方法能够在没有任何先验知识的情况下学习到更好的特征,同时还表现出卓越的噪声鲁棒性和性能。LADSVM方法擅长检测具有强烈顺序模式和噪声的虚拟机日志中的异常。然而,将其应用于无序日志数据时,其性能可能会有所不同。这凸显了仔细选择与不同日志数据类型的特定特征相匹配的检测方法的必要性。