Integrated Science Lab, Department of Physics, Umeå University, Umeå, Sweden.
PLoS One. 2013;8(1):e53943. doi: 10.1371/journal.pone.0053943. Epub 2013 Jan 23.
Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984-2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.
社区检测有助于我们简化网络的复杂结构,但只有在社区具有统计学意义时才可靠。为了检测具有统计学意义的社区,一种常见的方法是对原始网络进行重采样并分析社区。但是,重采样假设样本之间是独立的,而网络的组成部分本质上是相互依赖的。因此,我们必须了解在重采样组件之间打破依赖关系如何影响显著性分析的结果。在这里,我们使用科学交流作为模型系统来分析这种影响。我们的数据集包括在 1984 年至 2010 年间发表在期刊上的文章之间的引文。我们比较了引文的参数重采样与非参数文章重采样。虽然引文重采样打破了链接依赖关系,但文章重采样保持了这种依赖关系。我们发现,引文重采样低估了链接权重的方差。此外,这种低估解释了排名和聚类的显著性分析中的大部分差异。因此,当只有链接权重可用且文章重采样不是选项时,我们建议使用一种简单的参数重采样方案,该方案生成的链接权重方差接近文章重采样的链接权重方差。尽管如此,当我们突出和总结科学中的重要结构变化时,我们在重采样方案中保持的依赖性越多,我们就越早能够预测结构变化。