Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies, SOKENDAI, Tokyo, Japan.
Department of Global Management, Chuo University, Tokyo, Japan.
PLoS One. 2022 Jun 29;17(6):e0269845. doi: 10.1371/journal.pone.0269845. eCollection 2022.
We propose a stochastic generative model to represent a directed graph constructed by citations among academic papers, where nodes and directed edges represent papers with discrete publication time and citations respectively. The proposed model assumes that a citation between two papers occurs with a probability based on the type of the citing paper, the importance of cited paper, and the difference between their publication times, like the existing models. We consider the out-degrees of citing paper as its type, because, for example, survey paper cites many papers. We approximate the importance of a cited paper by its in-degrees. In our model, we adopt three functions: a logistic function for illustrating the numbers of papers published in discrete time, an inverse Gaussian probability distribution function to express the aging effect based on the difference between publication times, and an exponential distribution (or a generalized Pareto distribution) for describing the out-degree distribution. We consider that our model is a more reasonable and appropriate stochastic model than other existing models and can perform complete simulations without using original data. In this paper, we first use the Web of Science database and see the features used in our model. By using the proposed model, we can generate simulated graphs and demonstrate that they are similar to the original data concerning the in- and out-degree distributions, and node triangle participation. In addition, we analyze two other citation networks derived from physics papers in the arXiv database and verify the effectiveness of the model.
我们提出了一个随机生成模型来表示由学术论文之间的引文构建的有向图,其中节点和有向边分别表示具有离散出版时间的论文和引文。所提出的模型假设,两篇论文之间的引文发生的概率基于引用论文的类型、被引论文的重要性以及它们的出版时间之间的差异,就像现有的模型一样。我们将引用论文的出度视为其类型,因为例如综述论文会引用许多论文。我们通过论文的入度来近似被引论文的重要性。在我们的模型中,我们采用了三个函数:逻辑函数用于说明在离散时间内发表的论文数量,逆高斯概率分布函数用于根据出版时间的差异表达老化效应,以及指数分布(或广义帕累托分布)用于描述出度分布。我们认为我们的模型是比其他现有模型更合理和适当的随机模型,并且可以在不使用原始数据的情况下进行完整的模拟。在本文中,我们首先使用 Web of Science 数据库并查看模型中使用的特征。通过使用所提出的模型,我们可以生成模拟图,并证明它们与原始数据在入度和出度分布以及节点三角形参与度方面相似。此外,我们分析了 arXiv 数据库中两个源自物理论文的其他引文网络,并验证了模型的有效性。