Shalizi Cosma Rohilla, Rinaldo Alessandro
Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 USA.
Ann Stat. 2013 Apr;41(2):508-535. doi: 10.1214/12-AOS1044.
The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is , or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.
网络数据可用性的不断提高以及对分布式系统的科学兴趣,促使网络结构统计模型迅速发展。然而,通常这些都是针对整个网络的模型,而数据仅由一个抽样子网组成。通过将模型应用于子网来估计整个网络(这才是我们感兴趣的对象)的参数。这假定模型是……,或者就随机过程理论而言,它定义了一个投影族。聚焦于流行的指数随机图模型(ERGM)类别,我们表明许多流行且具有科学吸引力的模型实际上违反了这个看似微不足道的条件,并且满足该条件会极大地限制ERGM的表达能力。这些结果实际上是关于相依随机变量指数族的更一般结果的特殊情况,我们也证明了这些一般结果。利用这些结果,我们为ERGM中极大似然估计的一致性提供了易于检验的条件,并讨论了一些可能的建设性应对措施。