Suppr超能文献

2024 年 PubMed 计算作者:生物医学文献中已消歧作者名称的开放资源。

PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature.

机构信息

National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States.

Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT 06510, United States.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae672.

Abstract

SUMMARY

Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.

AVAILABILITY AND IMPLEMENTATION

The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.

摘要

摘要

在 PubMed 中,超过 55%的作者名是模糊的:同一个名字被不同的个体研究人员共享。这给基于作者名查询的精确文献检索带来了重大挑战,这是生物医学文献检索中的常见行为。针对这一问题,我们提供了一个全面的去歧义作者数据集。具体来说,我们使用最新的 ORCID 数据来补充自动 PubMed Computed Authors 算法,以提高准确性。结果表明,增强后的算法在作者名去歧义方面表现出色,随后我们的数据集包含了超过 2100 万去歧义作者的 3500 多万篇 PubMed 文章,并每周进行增量更新。更重要的是,我们将数据集公开提供给社区,以便在除了辅助 PubMed 的作者名查询之外的各种潜在应用中使用。最后,我们提出了一套关于作者使用其姓名的最佳实践指南。

可用性和实现

PubMed Computed Authors 数据集可在以下网址进行批量下载:https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/。此外,还可以通过以下网址进行查询:https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/。

相似文献

7
Author Name Disambiguation for PubMed.PubMed的作者姓名消歧
J Assoc Inf Sci Technol. 2014 Apr;65(4):765-781. doi: 10.1002/asi.23063. Epub 2013 Nov 21.
8
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.G-Bean:基于本体图的生物医学文献检索网络工具。
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6.
9
Building a PubMed knowledge graph.构建 PubMed 知识图谱。
Sci Data. 2020 Jun 26;7(1):205. doi: 10.1038/s41597-020-0543-2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验