Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Ave, SPH2, 4th Floor, Boston, MA, 02115, USA.
Sci Rep. 2019 Nov 13;9(1):16674. doi: 10.1038/s41598-019-53166-6.
Network models are applied in numerous domains where data arise from systems of interactions among pairs of actors. Both statistical and mechanistic network models are increasingly capable of capturing various dependencies among these actors. Yet, these dependencies pose statistical challenges for analyzing such data, especially when the data set comprises only a single observation of one network, often leading to intractable likelihoods regardless of the modeling paradigm and limiting the application of existing statistical methods for networks. We explore a subsampling bootstrap procedure to serve as the basis for goodness of fit and model selection with a single observed network that circumvents the intractability of such likelihoods. Our approach is based on flexible resampling distributions formed from the single observed network, allowing for more nuanced and higher dimensional comparisons than point estimates of quantities of interest. We include worked examples for model selection, with simulation, and assessment of goodness of fit, with duplication-divergence model fits for yeast (S.cerevisiae) protein-protein interaction data from the literature. The proposed approach produces a flexible resampling distribution that can be based on any network statistics of one's choosing and can be employed for both statistical and mechanistic network models.
网络模型被应用于许多领域,在这些领域中,数据来源于参与者之间相互作用的系统。统计和机械网络模型都越来越能够捕捉这些参与者之间的各种依赖性。然而,这些依赖性给分析这些数据带来了统计上的挑战,尤其是当数据集只包含一个网络的单个观察时,无论建模范例如何,通常都会导致难以处理的似然性,从而限制了现有网络统计方法的应用。我们探索了一种子抽样自举程序,作为拟合优度和模型选择的基础,该程序避免了这种似然性的复杂性。我们的方法基于从单个观察网络形成的灵活重抽样分布,允许比感兴趣数量的点估计进行更细致和更高维的比较。我们包括用于模型选择的示例,包括模拟和拟合优度评估,以及对文献中酵母(Saccharomyces cerevisiae)蛋白质-蛋白质相互作用数据的重复-发散模型拟合。所提出的方法生成了一个灵活的重抽样分布,可以基于任何网络统计数据,并可用于统计和机械网络模型。