Jiao Bo
School of Information Science and Technology, Xiamen University Tan Kah Kee College, Zhangzhou, 363123, Fujian, China.
Sci Rep. 2024 Jun 10;14(1):13340. doi: 10.1038/s41598-024-64018-3.
Graph sampling plays an important role in data mining for large networks. Specifically, larger networks often correspond to lower sampling rates. Under the situation, traditional traversal-based samplings for large networks usually have an excessive preference for densely-connected network core nodes. Aim at this issue, this paper proposes a sampling method for unknown networks at low sampling rates, called SLSR, which first adopts a random node sampling to evaluate a degree threshold, utilized to distinguish the core from periphery, and the average degree in unknown networks, and then runs a double-layer sampling strategy on the core and periphery. SLSR is simple that results in a high time efficiency, but experiments verify that the proposed method can accurately preserve many critical structures of unknown large scale-free networks with low sampling rates and low variances.
图采样在大型网络的数据挖掘中起着重要作用。具体而言,更大的网络通常对应更低的采样率。在这种情况下,传统的基于遍历的大型网络采样通常对密集连接的网络核心节点有过度偏好。针对这个问题,本文提出了一种低采样率下未知网络的采样方法,称为SLSR,该方法首先采用随机节点采样来评估一个度阈值,用于区分核心和外围,以及未知网络中的平均度,然后在核心和外围上运行双层采样策略。SLSR很简单,具有很高的时间效率,但实验证明,该方法能够在低采样率和低方差的情况下准确保留未知大规模无标度网络的许多关键结构。