使用基于图形的特征变换器和对比学习的先进云入侵检测框架。

Advanced cloud intrusion detection framework using graph based features transformers and contrastive learning.

作者信息

Govindarajan Vijay, Muzamal Junaid Hussain

机构信息

Colorado State University, Seattle, USA.

National University of Computer and Emerging Sciences, Lahore, Pakistan.

出版信息

Sci Rep. 2025 Jul 1;15(1):20511. doi: 10.1038/s41598-025-07956-w.

DOI:10.1038/s41598-025-07956-w

PMID:40595172

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12215497/

Abstract

This paper presents a modular and scalable intrusion detection framework that combines graph-based feature extraction, Transformer-based autoencoding, and contrastive learning to improve detection accuracy in cloud environments. Network flows are modeled as graphs to capture relational patterns among IP addresses and services, and a Graph Neural Network (GNN) is used to extract structured embeddings. These embeddings are refined through a Transformer-based autoencoder to preserve contextual information, while contrastive learning enforces clear class separation during classification. The system is evaluated on NSL-KDD and CIC-IDS2018 datasets under both binary and multi-class scenarios. Experimental results show an average accuracy of 99.97%, with high precision and recall across all attack types, including minority classes such as U2R and R2L. The model achieves low false-positive rates and demonstrates real-time inference performance with modest resource requirements. Key contributions include an interpretable pipeline using SHAP for feature attribution, a strategy for mitigating class imbalance, and validation across datasets with detailed security and generalizability analyses. These results support the practical applicability of the proposed approach in high-throughput, cloud-based network environments.

摘要

本文提出了一种模块化且可扩展的入侵检测框架，该框架结合了基于图的特征提取、基于Transformer的自动编码和对比学习，以提高云环境中的检测准确率。网络流被建模为图，以捕获IP地址和服务之间的关系模式，并使用图神经网络（GNN）来提取结构化嵌入。这些嵌入通过基于Transformer的自动编码器进行优化，以保留上下文信息，而对比学习在分类过程中强制实现清晰的类别分离。该系统在NSL-KDD和CIC-IDS2018数据集上进行了二分类和多分类场景的评估。实验结果表明，平均准确率为99.97%，在所有攻击类型上都具有高精度和召回率，包括U2R和R2L等少数类。该模型实现了低误报率，并在资源需求适中的情况下展示了实时推理性能。主要贡献包括使用SHAP进行特征归因的可解释管道、缓解类别不平衡的策略以及通过详细的安全性和泛化性分析在多个数据集上进行验证。这些结果支持了所提出方法在高吞吐量、基于云的网络环境中的实际适用性。