Kolaczyk Eric D, Krivitsky Pavel N
Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA.
School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW 2500, Australia.
Stat Sci. 2015 May 1;30(2):184-198. doi: 10.1214/14-STS502.
The modeling and analysis of networks and network data has seen an explosion of interest in recent years and represents an exciting direction for potential growth in statistics. Despite the already substantial amount of work done in this area to date by researchers from various disciplines, however, there remain many questions of a decidedly foundational nature - natural analogues of standard questions already posed and addressed in more classical areas of statistics - that have yet to even be posed, much less addressed. Here we raise and consider one such question in connection with network modeling. Specifically, we ask, "Given an observed network, what is the sample size?" Using simple, illustrative examples from the class of exponential random graph models, we show that the answer to this question can very much depend on basic properties of the networks expected under the model, as the number of vertices in the network grows. In particular, adopting the (asymptotic) scaling of the variance of the maximum likelihood parameter estimates as a notion of effective sample size, say , we show that whether the networks are sparse or not under our model (i.e., having relatively few or many edges between vertices, respectively) is sufficient to yield an order of magnitude difference in , from ( ) to [Formula: see text]. We then explore some practical implications of this result, using both simulation and data on food-sharing from Lamalera, Indonesia.
近年来,网络及网络数据的建模与分析受到了极大关注,代表了统计学领域一个令人兴奋的潜在增长方向。然而,尽管到目前为止各学科的研究人员在这一领域已经开展了大量工作,但仍然存在许多具有明确基础性的问题——这些问题是统计学更经典领域中已经提出并解决的标准问题的自然类似问题——甚至尚未被提出,更不用说得到解决了。在此,我们提出并思考与网络建模相关的一个此类问题。具体而言,我们要问:“给定一个观测到的网络,样本量是多少?”通过指数随机图模型类中的简单示例,我们表明,随着网络中顶点数量的增加,这个问题的答案很大程度上取决于模型下预期网络的基本属性。特别地,将最大似然参数估计值的方差的(渐近)缩放作为有效样本量的一种度量,比如说 ,我们表明,在我们的模型下网络是稀疏还是密集(即顶点之间分别具有相对较少或较多的边)足以在 中产生一个数量级的差异,从( )到[公式:见正文]。然后,我们利用模拟以及来自印度尼西亚拉马勒拉的食物共享数据,探讨了这一结果的一些实际意义。