Hsu Chun-Nan, Bandrowski Anita E, Gillespie Thomas H, Udell Jon, Lin Ko-Wei, Ozyurt Ibrahim Burak, Grethe Jeffrey S, Martone Maryann E
University of California San Diego.
University of California San Diego, SciCrunch, Inc.
Comput Sci Eng. 2020 Mar-Apr;22(2):22-32. doi: 10.1109/mcse.2019.2952838. Epub 2019 Nov 12.
The Research Resource Identifier (RRID) was introduced in 2014 to better identify biomedical research resources and track their use across the literature, including key digital resources such as databases and software. Authors include an RRID after the first mention of any resource used. Here, we provide an overview of RRIDs and analyze their use for digital resource identification. We quantitatively compare the output of our RRID curation workflow with the outputs of automated text mining systems used to identify resource mentions in text. The results show that authors follow RRID reporting guidelines well, and that our natural language processing based text mining was able to identify nearly all of the resources identified by RRIDs as well as thousands more. Finally, we demonstrate how RRIDs and text mining can complement each other to provide a scalable solution to digital resource citation.
研究资源标识符(RRID)于2014年推出,旨在更好地识别生物医学研究资源并追踪其在文献中的使用情况,包括数据库和软件等关键数字资源。作者在首次提及任何使用的资源后附上RRID。在此,我们概述RRID并分析其在数字资源识别中的应用。我们定量比较了RRID编目工作流程的输出与用于识别文本中资源提及的自动化文本挖掘系统的输出。结果表明,作者很好地遵循了RRID报告指南,并且我们基于自然语言处理的文本挖掘能够识别几乎所有由RRID识别的资源以及数千个更多资源。最后,我们展示了RRID和文本挖掘如何相互补充,以提供一种可扩展的数字资源引用解决方案。