Suppr超能文献

为 PubMed 作者姓名消歧聚合大规模数据库。

Aggregating large-scale databases for PubMed author name disambiguation.

机构信息

School of Information Management, Wuhan University, Wuhan, China.

出版信息

J Am Med Inform Assoc. 2021 Aug 13;28(9):1919-1927. doi: 10.1093/jamia/ocab095.

Abstract

OBJECTIVE

PubMed has suffered from the author ambiguity problem for many years. Existing studies on author name disambiguation (AND) for PubMed only used internal metadata for development. However, some of them are incomplete (eg, a large number of names are only abbreviated and their full names are not available) or less discriminative. To this end, we present a new disambiguation method, namely AggAND, by aggregating information from external databases.

MATERIALS AND METHODS

We address this issue by exploring Microsoft Academic Graph, Semantic Scholar, and PubMed Knowledge Graph to enhance the built-in name metadata, and extend the internal metadata with some external and more discriminative metadata.

RESULTS

Experimental results on enhanced name metadata demonstrate comparable performance to 3 author identifier systems, as well as show superiority over the original name metadata. More importantly, our method, AggAND, incorporating both enhanced name and extended metadata, yields F1 scores of 95.80% and 93.71% on 2 datasets and outperforms the state-of-the-art method by a large margin (3.61% and 6.55%, respectively).

CONCLUSIONS

The feasibility and good performance of our methods not only help better understand the importance of external databases for disambiguation, but also point to a promising direction for future AND studies in which information aggregated from multiple bibliographic databases can be effective in improving disambiguation performance. The methodology shown here can be generalized to broader bibliographic databases beyond PubMed. Our code and data are available online (https://github.com/carmanzhang/PubMed-AND-method).

摘要

目的

PubMed 多年来一直存在作者歧义问题。现有的 PubMed 作者名称消歧(AND)研究仅使用内部元数据进行开发。然而,其中一些元数据不完整(例如,大量名称仅缩写,其全名不可用)或区分度较低。为此,我们通过探索 Microsoft Academic Graph、Semantic Scholar 和 PubMed Knowledge Graph 提出了一种新的消歧方法 AggAND,以聚合来自外部数据库的信息。

材料和方法

我们通过探索 Microsoft Academic Graph、Semantic Scholar 和 PubMed Knowledge Graph 来解决这个问题,以增强内置的名称元数据,并使用一些外部和更具区分度的元数据扩展内部元数据。

结果

增强名称元数据的实验结果表明,与 3 个作者标识符系统的性能相当,并且优于原始名称元数据。更重要的是,我们的方法 AggAND,结合了增强的名称和扩展的元数据,在 2 个数据集上的 F1 得分为 95.80%和 93.71%,明显优于最先进的方法(分别为 3.61%和 6.55%)。

结论

我们方法的可行性和良好性能不仅有助于更好地理解外部数据库对于消歧的重要性,而且为未来的 AND 研究指明了一个有希望的方向,即从多个书目数据库聚合信息可以有效地提高消歧性能。这里展示的方法可以推广到更广泛的书目数据库,而不仅仅是 PubMed。我们的代码和数据可在网上获取(https://github.com/carmanzhang/PubMed-AND-method)。

相似文献

4
Author Name Disambiguation for PubMed.PubMed的作者姓名消歧
J Assoc Inf Sci Technol. 2014 Apr;65(4):765-781. doi: 10.1002/asi.23063. Epub 2013 Nov 21.
8
Author Name Disambiguation in MEDLINE.医学在线数据库(MEDLINE)中的作者姓名消歧
ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验