Börner Katy, Maru Jeegar T, Goldstone Robert L
School of Library and Information Science, Indiana University, Bloomington, IN 47405, USA.
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5266-73. doi: 10.1073/pnas.0307625100. Epub 2004 Feb 19.
There has been a long history of research into the structure and evolution of mankind's scientific endeavor. However, recent progress in applying the tools of science to understand science itself has been unprecedented because only recently has there been access to high-volume and high-quality data sets of scientific output (e.g., publications, patents, grants) and computers and algorithms capable of handling this enormous stream of data. This article reviews major work on models that aim to capture and recreate the structure and dynamics of scientific evolution. We then introduce a general process model that simultaneously grows coauthor and paper citation networks. The statistical and dynamic properties of the networks generated by this model are validated against a 20-year data set of articles published in PNAS. Systematic deviations from a power law distribution of citations to papers are well fit by a model that incorporates a partitioning of authors and papers into topics, a bias for authors to cite recent papers, and a tendency for authors to cite papers cited by papers that they have read. In this TARL model (for topics, aging, and recursive linking), the number of topics is linearly related to the clustering coefficient of the simulated paper citation network.
对人类科学事业的结构和演变进行研究已有很长的历史。然而,最近在应用科学工具来理解科学本身方面取得了前所未有的进展,因为直到最近才能够获取大量高质量的科学产出数据集(例如出版物、专利、资助)以及能够处理这一巨大数据流的计算机和算法。本文回顾了旨在捕捉和重现科学演变的结构和动态的模型的主要研究工作。然后我们介绍一个同时增长共同作者网络和论文引用网络的通用过程模型。通过与发表在《美国国家科学院院刊》上的20年文章数据集进行对比,验证了该模型生成的网络的统计和动态特性。通过一个将作者和论文按主题进行划分、作者倾向于引用近期论文以及作者倾向于引用他们读过的论文所引用的论文的模型,很好地拟合了论文引用偏离幂律分布的系统偏差。在这个TARL模型(用于主题、时效性和递归链接)中,主题数量与模拟的论文引用网络的聚类系数呈线性相关。