Dempsey Walter, Oselio Brandon, Hero Alfred
University of Michigan, Biostatistics, Ann Arbor, United States.
University of Michigan, Ann Arbor, United States.
J Am Stat Assoc. 2022;117(540):2056-2073. doi: 10.1080/01621459.2021.1896526. Epub 2021 May 10.
Network data often arises via a series of among a population of constituent elements. E-mail exchanges, for example, have a single sender followed by potentially multiple receivers. Scientific articles, on the other hand, may have multiple subject areas and multiple authors. We introduce a statistical model, termed the Pitman-Yor hierarchical vertex components model (PY-HVCM), that is well suited for structured interaction data. The proposed PY-HVCM effectively models complex relational data by partial pooling of local information via a latent, shared population-level distribution. The PY-HCVM is a canonical example of - a subfamily of models for , i.e., networks invariant to interaction relabeling. Theoretical analysis and supporting simulations provide clear model interpretation, and establish global sparsity and power law degree distribution. A computationally tractable Gibbs sampling algorithm is derived for inferring sparsity and power law properties of complex networks. We demonstrate the model on both the Enron e-mail dataset and an ArXiv dataset, showing goodness of fit of the model via posterior predictive validation.
网络数据通常通过一系列构成元素群体之间的交互产生。例如,电子邮件交流有一个发送者,随后可能有多个接收者。另一方面,科学文章可能有多个主题领域和多个作者。我们引入了一种统计模型,称为皮特曼 - 约尔层次顶点组件模型(PY - HVCM),它非常适合结构化交互数据。所提出的PY - HVCM通过潜在的共享总体水平分布对局部信息进行部分合并,有效地对复杂关系数据进行建模。PY - HCVM是指数族的一个典型例子——用于交互重新标记不变网络的模型子族。理论分析和支持性模拟提供了清晰的模型解释,并建立了全局稀疏性和幂律度分布。推导了一种计算上易于处理的吉布斯采样算法,用于推断复杂网络的稀疏性和幂律特性。我们在安然电子邮件数据集和一个ArXiv数据集上展示了该模型,通过后验预测验证显示了模型的拟合优度。