Jiang Wei, Zhang Bin, Zhu Qixun, Liao Conghui, Wang Wenyong
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China.
China Telecom Sichuan Branch, Chengdu, 610012, Sichuan, China.
Sci Data. 2025 May 7;12(1):756. doi: 10.1038/s41597-025-04876-2.
The objective of internet traffic analysis is to identify latent patterns and ascertain the true state of internet operations by examining traffic data. This approach is considered an effective and valuable means to achieve accurate network management. Whilst the extant network traffic datasets are predominantly collated within a laboratory environment, exhibiting deficiencies with regard to authenticity in terms of network scales, users, behaviours, and temporal and spatial characteristics, this paper proposes an in-situ network deployment and data collection scheme involving a large number of devices and users. The scheme involves the collection of a large real Internet traffic dataset including encrypted and non-encrypted traffic through sensors deployed on real-world network access equipment. Through desensitization, cleaning, feature engineering and labelling, an open database is created for researchers in the field of traffic analysis to use in academic and engineering.
互联网流量分析的目标是通过检查流量数据来识别潜在模式并确定互联网运行的真实状态。这种方法被认为是实现精确网络管理的有效且有价值的手段。虽然现有的网络流量数据集主要是在实验室环境中整理的,在网络规模、用户、行为以及时空特征方面的真实性存在不足,但本文提出了一种涉及大量设备和用户的现场网络部署和数据收集方案。该方案包括通过部署在实际网络接入设备上的传感器收集大量真实的互联网流量数据集,包括加密和未加密流量。通过脱敏、清理、特征工程和标注,创建一个开放数据库供流量分析领域的研究人员用于学术和工程研究。