Psychology Department, Michigan State University, East Lansing, MI, USA.
Mathematics Department, Michigan State University, East Lansing, MI, USA.
Sci Rep. 2021 Dec 14;11(1):23929. doi: 10.1038/s41598-021-03238-3.
Projections of bipartite or two-mode networks capture co-occurrences, and are used in diverse fields (e.g., ecology, economics, bibliometrics, politics) to represent unipartite networks. A key challenge in analyzing such networks is determining whether an observed number of co-occurrences between two nodes is significant, and therefore whether an edge exists between them. One approach, the fixed degree sequence model (FDSM), evaluates the significance of an edge's weight by comparison to a null model in which the degree sequences of the original bipartite network are fixed. Although the FDSM is an intuitive null model, it is computationally expensive because it requires Monte Carlo simulation to estimate each edge's p value, and therefore is impractical for large projections. In this paper, we explore four potential alternatives to FDSM: fixed fill model, fixed row model, fixed column model, and stochastic degree sequence model (SDSM). We compare these models to FDSM in terms of accuracy, speed, statistical power, similarity, and ability to recover known communities. We find that the computationally-fast SDSM offers a statistically conservative but close approximation of the computationally-impractical FDSM under a wide range of conditions, and that it correctly recovers a known community structure even when the signal is weak. Therefore, although each backbone model may have particular applications, we recommend SDSM for extracting the backbone of bipartite projections when FDSM is impractical.
二部或双模网络的投影捕捉共现,并在多个领域(例如生态学、经济学、文献计量学、政治学)中用于表示单一部网络。分析此类网络的一个关键挑战是确定两个节点之间观察到的共现次数是否显著,因此它们之间是否存在边。一种方法是固定度序列模型(FDSM),通过与原始二部网络的度序列固定的 null 模型进行比较,来评估边权重的显著性。尽管 FDSM 是一种直观的 null 模型,但它的计算成本很高,因为它需要通过蒙特卡罗模拟来估计每个边的 p 值,因此对于大型投影来说不切实际。在本文中,我们探索了 FDSM 的四个潜在替代方案:固定填充模型、固定行模型、固定列模型和随机度序列模型(SDSM)。我们根据准确性、速度、统计功效、相似性和恢复已知社区的能力来比较这些模型与 FDSM。我们发现,在广泛的条件下,计算速度快的 SDSM 提供了一种统计保守但与计算上不切实际的 FDSM 非常接近的近似值,并且即使信号较弱,它也可以正确恢复已知的社区结构。因此,虽然每个骨干模型都可能有特定的应用,但我们建议在 FDSM 不切实际时,使用 SDSM 提取二部投影的骨干。