Chen Sixing, Mira Antonietta, Onnela Jukka-Pekka
Department of Biostatistics, T.H. Chan School of Public Health, Harvard University 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA 02115, USA.
Data Science Lab, Institute of Computational Science, Università della Svizzera italiana Via Buffi 6, 6900 Lugano, Switzerland and Dipartimento di Scienza e Alta Tecnologia, Università degli Studi dell'Insubria Via Valleggio, 11 - 22100 Como, Italy.
J Complex Netw. 2020 Apr;8(2):cnz024. doi: 10.1093/comnet/cnz024. Epub 2019 Aug 2.
Network models are applied across many domains where data can be represented as a network. Two prominent paradigms for modelling networks are statistical models (probabilistic models for the observed network) and mechanistic models (models for network growth and/or evolution). Mechanistic models are better suited for incorporating domain knowledge, to study effects of interventions (such as changes to specific mechanisms) and to forward simulate, but they typically have intractable likelihoods. As such, and in a stark contrast to statistical models, there is a relative dearth of research on model selection for such models despite the otherwise large body of extant work. In this article, we propose a simulator-based procedure for mechanistic network model selection that borrows aspects from Approximate Bayesian Computation along with a means to quantify the uncertainty in the selected model. To select the most suitable network model, we consider and assess the performance of several learning algorithms, most notably the so-called Super Learner, which makes our framework less sensitive to the choice of a particular learning algorithm. Our approach takes advantage of the ease to forward simulate from mechanistic network models to circumvent their intractable likelihoods. The overall process is flexible and widely applicable. Our simulation results demonstrate the approach's ability to accurately discriminate between competing mechanistic models. Finally, we showcase our approach with a protein-protein interaction network model from the literature for yeast ().
网络模型应用于许多数据可表示为网络的领域。网络建模的两个突出范例是统计模型(观测网络的概率模型)和机制模型(网络生长和/或演化模型)。机制模型更适合纳入领域知识,以研究干预效果(如特定机制的变化)并进行正向模拟,但它们通常具有难以处理的似然性。因此,与统计模型形成鲜明对比的是,尽管已有大量现存工作,但针对此类模型的模型选择研究相对较少。在本文中,我们提出了一种基于模拟器的机制网络模型选择程序,该程序借鉴了近似贝叶斯计算的一些方面,并提供了一种量化所选模型不确定性的方法。为了选择最合适的网络模型,我们考虑并评估了几种学习算法的性能,最值得注意的是所谓的超级学习器,这使得我们的框架对特定学习算法的选择不太敏感。我们的方法利用了从机制网络模型进行正向模拟的便利性,以规避其难以处理的似然性。整个过程灵活且广泛适用。我们的模拟结果证明了该方法能够准确区分相互竞争的机制模型。最后,我们用文献中酵母的蛋白质 - 蛋白质相互作用网络模型展示了我们的方法。