Miao Weiwei, Zhao Xinjian, Zhang Yinzhao, Chen Shi, Li Xiaochao, Li Qianmu
State Grid Jiangsu Electric Power Co., Ltd., Information & Telecommunication Branch, Nanjing 210024, China.
School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
Sensors (Basel). 2024 Jun 22;24(13):4069. doi: 10.3390/s24134069.
In the development of the Power Industry Internet of Things, the security of data interaction has always been an important challenge. In the power-based blockchain Industrial Internet of Things, node data interaction involves a large amount of sensitive data. In the current anti-leakage strategy for power business data interaction, regular expressions are used to identify sensitive data for matching. This approach is only suitable for simple structured data. For the processing of unstructured data, there is a lack of practical matching strategies. Therefore, this paper proposes a deep learning-based anti-leakage method for power business data interaction, aiming to ensure the security of power business data interaction between the State Grid business platform and third-party platforms. This method combines named entity recognition technologies and comprehensively uses regular expressions and the DeBERTa (Decoding-enhanced BERT with disentangled attention)-BiLSTM (Bidirectional Long Short-Term Memory)-CRF (Conditional Random Field) model. This method is based on the DeBERTa (Decoding-enhanced BERT with disentangled attention) model for pre-training feature extraction. It extracts sequence context semantic features through the BiLSTM, and finally obtains the global optimal through the CRF layer tag sequence. Sensitive data matching is performed on interactive structured and unstructured data to identify privacy-sensitive information in the power business. The experimental results show that the F1 score of the proposed method in this paper for identifying sensitive data entities using the CLUENER 2020 dataset reaches 81.26%, which can effectively prevent the risk of power business data leakage and provide innovative solutions for the power industry to ensure data security.
在电力行业物联网的发展过程中,数据交互安全一直是一项重大挑战。在基于电力的区块链工业物联网中,节点数据交互涉及大量敏感数据。在当前电力业务数据交互的防泄漏策略中,使用正则表达式来识别敏感数据进行匹配。这种方法仅适用于简单结构化数据。对于非结构化数据的处理,缺乏实用的匹配策略。因此,本文提出一种基于深度学习的电力业务数据交互防泄漏方法,旨在确保国家电网业务平台与第三方平台之间电力业务数据交互的安全性。该方法结合命名实体识别技术,综合运用正则表达式以及DeBERTa(带解缠注意力的解码增强型BERT)-BiLSTM(双向长短期记忆)-CRF(条件随机场)模型。此方法基于DeBERTa(带解缠注意力的解码增强型BERT)模型进行预训练特征提取,通过BiLSTM提取序列上下文语义特征,最后经CRF层获得全局最优标签序列。对交互式结构化和非结构化数据进行敏感数据匹配,以识别电力业务中的隐私敏感信息。实验结果表明,本文所提方法在使用CLUENER 2020数据集识别敏感数据实体时的F1分数达到81.26%,能够有效防范电力业务数据泄漏风险,为电力行业保障数据安全提供创新解决方案。