Suppr超能文献

查找PubMed的引文:五个免费书目数据源之间的大规模比较。

Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources.

作者信息

Liang Zhentao, Mao Jin, Lu Kun, Li Gang

机构信息

Center for Studies of Information Resources, Wuhan University, Bayi Road #299, Wuchang District, Wuhan, 430072 Hubei China.

School of Information Management, Wuhan University, Bayi Road #299, Wuchang District, Wuhan, 430072 Hubei China.

出版信息

Scientometrics. 2021;126(12):9519-9542. doi: 10.1007/s11192-021-04191-8. Epub 2021 Oct 24.

Abstract

As an important biomedical database, PubMed provides users with free access to abstracts of its documents. However, citations between these documents need to be collected from external data sources. Although previous studies have investigated the coverage of various data sources, the quality of citations is underexplored. In response, this study compares the coverage and citation quality of five freely available data sources on 30 million PubMed documents, including OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI), Dimensions, Microsoft Academic Graph (MAG), National Institutes of Health's Open Citation Collection (NIH-OCC), and Semantic Scholar Open Research Corpus (S2ORC). Three gold standards and five metrics are introduced to evaluate the correctness and completeness of citations. Our results indicate that Dimensions is the most comprehensive data source that provides references for 62.4% of PubMed documents, outperforming the official NIH-OCC dataset (56.7%). Over 90% of citation links in other data sources can also be found in Dimensions. The coverage of MAG, COCI, and S2ORC is 59.6%, 34.7%, and 23.5%, respectively. Regarding the citation quality, Dimensions and NIH-OCC achieve the best overall results. Almost all data sources have a precision higher than 90%, but their recall is much lower. All databases have better performances on recent publications than earlier ones. Meanwhile, the gaps between different data sources have diminished for the documents published in recent years. This study provides evidence for researchers to choose suitable PubMed citation sources, which is also helpful for evaluating the citation quality of free bibliographic databases.

摘要

作为一个重要的生物医学数据库,PubMed为用户提供免费获取其文献摘要的服务。然而,这些文献之间的引用需要从外部数据源收集。尽管先前的研究已经调查了各种数据源的覆盖范围,但引用质量却未得到充分探索。为此,本研究比较了五个免费可用数据源在3000万篇PubMed文献上的覆盖范围和引用质量,这五个数据源包括CrossRef开放DOI到DOI引用的开放引用索引(COCI)、Dimensions、微软学术图谱(MAG)、美国国立卫生研究院的开放引用集(NIH - OCC)和语义学者开放研究语料库(S2ORC)。引入了三个黄金标准和五个指标来评估引用的正确性和完整性。我们的结果表明,Dimensions是最全面的数据源,为62.4%的PubMed文献提供参考文献,优于官方的NIH - OCC数据集(56.7%)。其他数据源中超过90%的引用链接也能在Dimensions中找到。MAG、COCI和S2ORC的覆盖范围分别为59.6%、34.7%和23.5%。在引用质量方面,Dimensions和NIH - OCC总体结果最佳。几乎所有数据源的精确率都高于90%,但其召回率要低得多。所有数据库在近期出版物上的表现都优于早期出版物。同时,近年来发表的文献在不同数据源之间的差距已经缩小。本研究为研究人员选择合适的PubMed引用源提供了依据,也有助于评估免费书目数据库的引用质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5223/8542188/a3d29f92cfd6/11192_2021_4191_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验