School of Software, Xinjiang University, Urumqi 830046, China.
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.
Sensors (Basel). 2023 May 24;23(11):5042. doi: 10.3390/s23115042.
System logs are a crucial component of system maintainability, as they record the status of the system and essential events for troubleshooting and maintenance when necessary. Therefore, anomaly detection of system logs is crucial. Recent research has focused on extracting semantic information from unstructured log messages for log anomaly detection tasks. Since BERT models work well in natural language processing, this paper proposes an approach called CLDTLog, which introduces contrastive learning and dual-objective tasks in a BERT pre-trained model and performs anomaly detection on system logs through a fully connected layer. This approach does not require log parsing and thus can avoid the uncertainty caused by log parsing. We trained the CLDTLog model on two log datasets (HDFS and BGL) and achieved F1 scores of 0.9971 and 0.9999 on the HDFS and BGL datasets, respectively, which performed better than all known methods. In addition, when using only 1% of the BGL dataset as training data, CLDTLog still achieves an F1 score of 0.9993, showing excellent generalization performance with a significant reduction of the training cost.
系统日志是系统可维护性的重要组成部分,因为它们记录了系统的状态和必要事件,以便在需要时进行故障排除和维护。因此,对系统日志进行异常检测至关重要。最近的研究集中在从非结构化日志消息中提取语义信息,以用于日志异常检测任务。由于 BERT 模型在自然语言处理方面表现出色,因此本文提出了一种名为 CLDTLog 的方法,该方法在 BERT 预训练模型中引入了对比学习和双目标任务,并通过全连接层对系统日志进行异常检测。该方法不需要日志解析,因此可以避免日志解析带来的不确定性。我们在两个日志数据集(HDFS 和 BGL)上训练了 CLDTLog 模型,在 HDFS 和 BGL 数据集上的 F1 得分分别为 0.9971 和 0.9999,优于所有已知方法。此外,当仅使用 BGL 数据集的 1%作为训练数据时,CLDTLog 仍然可以达到 0.9993 的 F1 得分,表现出出色的泛化性能,同时大大降低了训练成本。