基于CTGAN的物联网传感器网络入侵检测系统的隐私保护合成数据生成方法

Privacy-Preserving Synthetic Data Generation Method for IoT-Sensor Network IDS Using CTGAN.

作者信息

Alabdulwahab Saleh, Kim Young-Tak, Son Yunsik

机构信息

Department of Computer Science and Engineering, Dongguk University, Seoul 04620, Republic of Korea.

Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.

出版信息

Sensors (Basel). 2024 Nov 20;24(22):7389. doi: 10.3390/s24227389.

DOI:10.3390/s24227389

PMID:39599165

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11598811/

Abstract

The increased usage of IoT networks brings about new privacy risks, especially when intrusion detection systems (IDSs) rely on large datasets for machine learning (ML) tasks and depend on third parties for storing and training the ML-based IDS. This study proposes a privacy-preserving synthetic data generation method using a conditional tabular generative adversarial network (CTGAN) aimed at maintaining the utility of IoT sensor network data for IDS while safeguarding privacy. We integrate differential privacy (DP) with CTGAN by employing controlled noise injection to mitigate privacy risks. The technique involves dynamic distribution adjustment and quantile matching to balance the utility-privacy tradeoff. The results indicate a significant improvement in data utility compared to the standard DP method, achieving a KS test score of 0.80 while minimizing privacy risks such as singling out, linkability, and inference attacks. This approach ensures that synthetic datasets can support intrusion detection without exposing sensitive information.

摘要

物联网网络使用的增加带来了新的隐私风险，尤其是当入侵检测系统（IDS）依赖大型数据集进行机器学习（ML）任务，并依赖第三方存储和训练基于ML的IDS时。本研究提出了一种使用条件表格生成对抗网络（CTGAN）的隐私保护合成数据生成方法，旨在在保护隐私的同时保持物联网传感器网络数据对IDS的效用。我们通过采用受控噪声注入将差分隐私（DP）与CTGAN集成，以减轻隐私风险。该技术涉及动态分布调整和分位数匹配，以平衡效用-隐私权衡。结果表明，与标准DP方法相比，数据效用有显著提高，KS测试得分达到0.80，同时将诸如单挑、可链接性和推理攻击等隐私风险降至最低。这种方法确保合成数据集能够支持入侵检测而不暴露敏感信息。