Crane Harry, Dempsey Walter
Department of Statistics & Biostatistics, Rutgers University, 110 Frelinghuysen Avenue, Piscataway, NJ 08854, USA.
Department of Statistics, University of Michigan, 1085 S. University Ave, Ann Arbor, MI 48109, USA.
J Am Stat Assoc. 2018;113(523):1311-1326. doi: 10.1080/01621459.2017.1341413. Epub 2018 Jun 12.
Many modern network datasets arise from processes of interactions in a population, such as phone calls, email exchanges, co-authorships, and professional collaborations. In such interaction networks, the edges comprise the fundamental statistical units, making a framework for edge-labeled networks more appropriate for statistical analysis. In this context we initiate the study of and explore its basic statistical properties. Several theoretical and practical features make edge exchangeable models better suited to many applications in network analysis than more common vertex-centric approaches. In particular, edge exchangeable models allow for sparse structure and power law degree distributions, both of which are widely observed empirical properties that cannot be handled naturally by more conventional approaches. Our discussion culminates in the , which we identify here as the canonical family of edge exchangeable distributions. The Hollywood model is computationally tractable, admits a clear interpretation, exhibits good theoretical properties, and performs reasonably well in estimation and prediction as we demonstrate on real network datasets. As a generalization of the Hollywood model, we further identify the as a nonparametric subclass of models with a convenient stick breaking construction.
许多现代网络数据集源于群体中的交互过程,例如电话通话、电子邮件交流、共同作者关系以及专业合作。在这样的交互网络中,边构成了基本的统计单元,这使得用于边标记网络的框架更适合进行统计分析。在此背景下,我们启动了对[具体内容缺失]的研究并探索其基本统计特性。与更常见的以顶点为中心的方法相比,边可交换模型的一些理论和实际特性使其更适合网络分析中的许多应用。特别是,边可交换模型允许稀疏结构和幂律度分布,这两个都是广泛观察到的经验特性,而更传统的方法无法自然地处理这些特性。我们的讨论最终聚焦于[具体内容缺失],我们在此将其确定为边可交换分布的规范族。好莱坞模型在计算上易于处理,有清晰的解释,具有良好的理论特性,并且正如我们在真实网络数据集上所展示的那样,在估计和预测方面表现相当不错。作为好莱坞模型的推广,我们进一步将[具体内容缺失]确定为具有便利的折断棍子构造的非参数模型子类。