Suppr超能文献

uCite:九个经过可靠性筛选的大规模公共PubMed引文数据集的合集。

uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering.

作者信息

Fang Liri, Salami Malik Oyewale, Weber Griffin M, Torvik Vetle I

机构信息

School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, United States.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

出版信息

Data Brief. 2025 Apr 2;60:111535. doi: 10.1016/j.dib.2025.111535. eCollection 2025 Jun.

Abstract

There has been a recent push to make public, aggregate, and increase coverage of bibliographic citation data. Here we describe uCite, a citation dataset containing 564 million PubMed citation pairs aggregated from the following nine sources: PubMed Central, iCite, OpenCitations, Dimensions, Microsoft Academic Graph, Aminer, Semantic Scholar, Lens, and OpCitance. Of these, 51 million (9%) were labeled unreliable, as determined by patterns of source discrepancies explained by ambiguous metadata, crosswalk, and typographical errors, citing future publications, and multi-paper documents. Each source contributes to improved coverage and reliability, but varies dramatically in precision and recall, estimates of which are contrasted with the Web of Science and Scopus herein.

摘要

最近一直在推动公开、汇总并增加书目引用数据的覆盖范围。在此,我们描述了uCite,这是一个包含5.64亿对PubMed引用的数据集,这些引用对汇总自以下九个来源:PubMed Central、iCite、OpenCitations、Dimensions、Microsoft Academic Graph、Aminer、Semantic Scholar、Lens和OpCitance。其中,5100万条(9%)被标记为不可靠,这是根据模糊元数据、交叉引用和排版错误、引用未来出版物以及多论文文档所解释的来源差异模式确定的。每个来源都有助于提高覆盖范围和可靠性,但在精确率和召回率方面差异很大,本文将其估计值与科学引文索引(Web of Science)和Scopus进行了对比。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa60/12049819/ed615a2555b5/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验