Bai Kevin Z, Fossaceca John M
Independent Researcher, Westwood, MA 02090, USA.
Department of Engineering Management and Systems Engineering, George Washington University, Washington, DC 20052, USA.
Sensors (Basel). 2024 Dec 26;25(1):78. doi: 10.3390/s25010078.
Effective network intrusion detection using anomaly scores from unsupervised machine learning models depends on the performance of the models. Although unsupervised models do not require labels during the training and testing phases, the assessment of their performance metrics during the evaluation phase still requires comparing anomaly scores against labels. In real-world scenarios, the absence of labels in massive network datasets makes it infeasible to calculate performance metrics. Therefore, it is valuable to develop an algorithm that calculates robust performance metrics without using labels. In this paper, we propose a novel algorithm, Expectation Maximization-Area Under the Curve (EM-AUC), to derive the Area Under the ROC Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR) by treating the unavailable labels as missing data and replacing them through their posterior probabilities. This algorithm was applied to two network intrusion datasets, yielding robust results. To the best of our knowledge, this is the first time AUC-ROC and AUC-PR, derived without labels, have been used to evaluate network intrusion detection systems. The EM-AUC algorithm enables model training, testing, and performance evaluation to proceed without comprehensive labels, offering a cost-effective and scalable solution for selecting the most effective models for network intrusion detection.
利用无监督机器学习模型的异常分数进行有效的网络入侵检测取决于模型的性能。虽然无监督模型在训练和测试阶段不需要标签,但在评估阶段对其性能指标的评估仍需要将异常分数与标签进行比较。在实际场景中,海量网络数据集中缺少标签使得计算性能指标变得不可行。因此,开发一种不使用标签就能计算稳健性能指标的算法很有价值。在本文中,我们提出了一种新颖的算法,期望最大化-曲线下面积(EM-AUC),通过将不可用标签视为缺失数据并通过其后验概率进行替换,来推导ROC曲线下面积(AUC-ROC)和精确率-召回率曲线下面积(AUC-PR)。该算法应用于两个网络入侵数据集,产生了稳健的结果。据我们所知,这是首次将无标签情况下推导的AUC-ROC和AUC-PR用于评估网络入侵检测系统。EM-AUC算法使模型训练、测试和性能评估能够在没有完整标签的情况下进行,为选择最有效的网络入侵检测模型提供了一种经济高效且可扩展的解决方案。