Ozyurt Ibrahim Burak, Grethe Jeffrey S, Martone Maryann E, Bandrowski Anita E
CRBS, UCSD, La Jolla, CA, United States of America.
PLoS One. 2016 Jan 5;11(1):e0146300. doi: 10.1371/journal.pone.0146300. eCollection 2016.
The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.
由神经科学信息框架开发和维护的NIF注册库是一个合作项目,旨在编目研究资源,如软件工具、数据库和组织库,这些资源主要由政府资助,并可供研究科学家使用。尽管NIF注册库最初是为神经科学设计的,但多年来其范围已经扩大,包括与生物医学研究普遍相关的研究资源。该注册库目前列出的研究资源数量超过13000个。范围扩大到生物医学科学促使我们将NIF注册库平台重新命名为SciCrunch。自2006年以来,NIF/SciCrunch注册库一直在编目资源状况;因此,它是一个有价值的数据集,可用于跟踪这些资源的广度、命运和利用情况。我们的经验表明,像数据库这样的研究资源是动态对象,会随着时间的推移而改变位置和范围。尽管每条记录都是手动输入并经过人工整理的,但注册库目前的规模需要能够帮助整理工作以保持内容最新的工具,包括这些资源的使用时间和地点。为应对这一挑战,我们开发了一套开源工具套件,统称为RDW:(网络)资源消歧器。RDW旨在帮助维护和管理注册库,并通过从文献中自动提取资源候选信息来增强注册库的内容。RDW工具包包括一个从论文中提取URL的工具、资源候选筛选器、资源URL更改跟踪器、资源内容更改跟踪器。管理员通过基于网络的用户界面访问这些工具。我们使用了多种策略来优化这些工具,包括监督学习和无监督学习算法以及统计文本分析。完整的工具套件用于增强和维护资源注册库,并通过为研究资源精心打造的创新文献引用索引来跟踪单个资源的使用情况。在此,我们对注册库进行概述,并展示RDW工具在整理和使用跟踪中的应用。