Liu Gan, Lu Kai, Pi Saiqi
School of Cyberspace Security (School of Cryptology), Hainan University, Haikou, China.
Department of Public Safety Technology, Hainan Vocational College of Politics and Law, Haikou, China.
PeerJ Comput Sci. 2025 Apr 4;11:e2769. doi: 10.7717/peerj-cs.2769. eCollection 2025.
The escalating frequency and severity of cyber-attacks have presented formidable challenges to the safeguarding of cyberspace. Named Entity Recognition (NER) technology is utilized for the rapid identification of threat entities and their relationships within cyber threat intelligence, enabling security researchers to be promptly informed of the occurrence of cyber threats, thereby enhancing the efficiency of security defense and analysis. However, current models for identifying network threat entities and extracting relationships suffer from limitations such as the inadequate representation of textual semantic information, insufficient granularity in threat entity recognition, and errors in relationship extraction propagation. To address these issues, this article proposes a novel model for Network Threat Entity Recognition and Relationship Extraction (CtiErRe). Additionally, it redefines seven network threat entities and two types of relationships between threat entities. Specifically, first, domain knowledge is collected to build a domain knowledge graph, which is then embedded using graph convolutional networks (GCN) to enhance the feature representation of threat intelligence text. Next, the features from domain knowledge graph embedding and those generated by the bidirectional encoder representations from transformers (BERT) model are fused using the Layernorm algorithm. Finally, the fused features are processed using the GlobalPointer algorithm to generate both the threat entity type matrix and the threat entity relation type matrix, thereby enabling the identification of threat entities and their relationships. To validate our proposed model, we conducted extensive experiments, and the results demonstrate its superiority over existing models. Our model performs remarkably in threat entity recognition tasks, with accuracy and F1 scores reaching 92.13% and 93.11%, respectively. In the relationship extraction task, our model achieves accuracy and F1 scores of 91.45% and 92.45%, respectively.
网络攻击频率和严重程度的不断升级给网络空间的安全保障带来了巨大挑战。命名实体识别(NER)技术用于在网络威胁情报中快速识别威胁实体及其关系,使安全研究人员能够及时了解网络威胁的发生情况,从而提高安全防御和分析的效率。然而,当前用于识别网络威胁实体和提取关系的模型存在局限性,如文本语义信息表示不足、威胁实体识别粒度不够以及关系提取传播错误等问题。为了解决这些问题,本文提出了一种新颖的网络威胁实体识别与关系提取模型(CtiErRe)。此外,它重新定义了七个网络威胁实体以及威胁实体之间的两种关系类型。具体而言,首先收集领域知识构建领域知识图谱,然后使用图卷积网络(GCN)对其进行嵌入,以增强威胁情报文本的特征表示。接下来,使用层归一化算法融合来自领域知识图谱嵌入的特征和由变换器双向编码器表征(BERT)模型生成的特征。最后,使用全局指针算法对融合后的特征进行处理,生成威胁实体类型矩阵和威胁实体关系类型矩阵,从而实现对威胁实体及其关系的识别。为了验证我们提出的模型,我们进行了广泛的实验,结果表明其优于现有模型。我们的模型在威胁实体识别任务中表现出色,准确率和F1分数分别达到92.13%和93.11%。在关系提取任务中,我们的模型准确率和F1分数分别达到91.45%和92.45%。