Suppr超能文献

uCite:九个经过可靠性筛选的大规模公共PubMed引文数据集的合集。

uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering.

作者信息

Fang Liri, Salami Malik Oyewale, Weber Griffin M, Torvik Vetle I

机构信息

School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, United States.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

出版信息

Data Brief. 2025 Apr 2;60:111535. doi: 10.1016/j.dib.2025.111535. eCollection 2025 Jun.

Abstract

There has been a recent push to make public, aggregate, and increase coverage of bibliographic citation data. Here we describe uCite, a citation dataset containing 564 million PubMed citation pairs aggregated from the following nine sources: PubMed Central, iCite, OpenCitations, Dimensions, Microsoft Academic Graph, Aminer, Semantic Scholar, Lens, and OpCitance. Of these, 51 million (9%) were labeled unreliable, as determined by patterns of source discrepancies explained by ambiguous metadata, crosswalk, and typographical errors, citing future publications, and multi-paper documents. Each source contributes to improved coverage and reliability, but varies dramatically in precision and recall, estimates of which are contrasted with the Web of Science and Scopus herein.

摘要

最近一直在推动公开、汇总并增加书目引用数据的覆盖范围。在此,我们描述了uCite,这是一个包含5.64亿对PubMed引用的数据集,这些引用对汇总自以下九个来源:PubMed Central、iCite、OpenCitations、Dimensions、Microsoft Academic Graph、Aminer、Semantic Scholar、Lens和OpCitance。其中,5100万条(9%)被标记为不可靠,这是根据模糊元数据、交叉引用和排版错误、引用未来出版物以及多论文文档所解释的来源差异模式确定的。每个来源都有助于提高覆盖范围和可靠性,但在精确率和召回率方面差异很大,本文将其估计值与科学引文索引(Web of Science)和Scopus进行了对比。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa60/12049819/ed615a2555b5/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验