时间网络采样：方法与偏差。

Sampling of temporal networks: Methods and biases.

机构信息

Department of Public Health Sciences, Karolinska Institutet, 17177 Stockholm, Sweden and Department of Mathematics, Université de Namur, 5000 Namur, Belgium.

Department of Engineering Mathematics, University of Bristol, Bristol BS8 1UB, United Kingdom.

出版信息

Phys Rev E. 2017 Nov;96(5-1):052302. doi: 10.1103/PhysRevE.96.052302. Epub 2017 Nov 1.

DOI:10.1103/PhysRevE.96.052302

PMID:29347767

Abstract

Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example, human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect the network structure and thus caution is necessary to generalize results based on samples. In this paper, we study four sampling strategies applied to a variety of real-life temporal networks. We quantify the biases generated by each sampling strategy on a number of relevant statistics such as link activity, temporal paths and epidemic spread. We find that some biases are common in a variety of networks and statistics, but one strategy, uniform sampling of nodes, shows improved performance in most scenarios. Given the particularities of temporal network data and the variety of network structures, we recommend that the choice of sampling methods be problem oriented to minimize the potential biases for the specific research questions on hand. Our results help researchers to better design network data collection protocols and to understand the limitations of sampled temporal network data.

摘要

时间网络越来越多地被用于模拟随时间演变的各种系统；例如，在其上发生动态过程（如传染病）的人类接触结构。现实生活中网络的一个基本方面是，它们在时间和空间框架内进行抽样。此外，人们可能希望对网络进行抽样以减小其大小，从而更好地进行可视化或执行计算密集型模拟。抽样方法可能会影响网络结构，因此需要谨慎，以便根据样本概括结果。在本文中，我们研究了应用于各种现实生活中的时间网络的四种抽样策略。我们量化了每种抽样策略对许多相关统计信息（如链路活动、时间路径和传染病传播）产生的偏差。我们发现，一些偏差在各种网络和统计中很常见，但一种策略，即节点的均匀抽样，在大多数情况下显示出了更好的性能。鉴于时间网络数据的特殊性和网络结构的多样性，我们建议根据具体的研究问题选择面向问题的抽样方法，以最小化潜在的偏差。我们的研究结果有助于研究人员更好地设计网络数据收集协议，并理解抽样时间网络数据的局限性。