ETH Zürich, Zürich, 8092, Switzerland.
Sci Rep. 2021 Jun 28;11(1):13416. doi: 10.1038/s41598-021-92519-y.
A fundamental issue of network data science is the ability to discern observed features that can be expected at random from those beyond such expectations. Configuration models play a crucial role there, allowing us to compare observations against degree-corrected null-models. Nonetheless, existing formulations have limited large-scale data analysis applications either because they require expensive Monte-Carlo simulations or lack the required flexibility to model real-world systems. With the generalized hypergeometric ensemble, we address both problems. To achieve this, we map the configuration model to an urn problem, where edges are represented as balls in an appropriately constructed urn. Doing so, we obtain the generalized hypergeometric ensemble of random graphs: a random graph model reproducing and extending the properties of standard configuration models, with the critical advantage of a closed-form probability distribution.
网络数据科学的一个基本问题是能够辨别随机出现的可观测特征和超出预期的可观测特征。配置模型在这方面起着至关重要的作用,使我们能够将观察结果与经过修正度数的零模型进行比较。然而,现有的公式化方法要么因为需要昂贵的蒙特卡罗模拟,要么因为缺乏对现实系统进行建模的必要灵活性,从而限制了大规模数据分析的应用。通过广义超几何系综,我们解决了这两个问题。为了实现这一目标,我们将配置模型映射到一个 urn 问题中,其中边被表示为适当构造的 urn 中的球。通过这样做,我们得到了随机图的广义超几何系综:一个复制和扩展标准配置模型属性的随机图模型,具有闭式概率分布的关键优势。