Suppr超能文献

SciSciNet:科学学研究的大规模开放数据湖。

SciSciNet: A large-scale open data lake for the science of science research.

机构信息

Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA.

Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.

出版信息

Sci Data. 2023 Jun 1;10(1):315. doi: 10.1038/s41597-023-02198-9.

Abstract

The science of science has attracted growing research interests, partly due to the increasing availability of large-scale datasets capturing the innerworkings of science. These datasets, and the numerous linkages among them, enable researchers to ask a range of fascinating questions about how science works and where innovation occurs. Yet as datasets grow, it becomes increasingly difficult to track available sources and linkages across datasets. Here we present SciSciNet, a large-scale open data lake for the science of science research, covering over 134M scientific publications and millions of external linkages to funding and public uses. We offer detailed documentation of pre-processing steps and analytical choices in constructing the data lake. We further supplement the data lake by computing frequently used measures in the literature, illustrating how researchers may contribute collectively to enriching the data lake. Overall, this data lake serves as an initial but useful resource for the field, by lowering the barrier to entry, reducing duplication of efforts in data processing and measurements, improving the robustness and replicability of empirical claims, and broadening the diversity and representation of ideas in the field.

摘要

科学学吸引了越来越多的研究兴趣,部分原因是越来越多的大规模数据集可用于捕捉科学的内部运作。这些数据集及其众多的相互关联,使研究人员能够提出一系列关于科学如何运作以及创新发生在哪里的引人入胜的问题。然而,随着数据集的增长,越来越难以跟踪可用的数据源和数据集之间的链接。在这里,我们展示了 SciSciNet,这是一个用于科学学研究的大规模开放数据湖,涵盖了超过 1.34 亿篇科学出版物以及数百万条与资金和公共用途的外部链接。我们详细记录了构建数据湖的预处理步骤和分析选择。我们进一步通过计算文献中常用的度量标准来补充数据湖,说明研究人员如何共同为丰富数据湖做出贡献。总体而言,该数据湖通过降低进入门槛、减少数据处理和度量标准的重复工作、提高实证主张的稳健性和可复制性以及拓宽领域内思想的多样性和代表性,为该领域提供了一个初步但有用的资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a7d/10235093/54b9672e71e2/41597_2023_2198_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验