Guralnick Robert P, Cellinese Nico, Deck John, Pyle Richard L, Kunze John, Penev Lyubomir, Walls Ramona, Hagedorn Gregor, Agosti Donat, Wieczorek John, Catapano Terry, Page Roderic D M
Florida Museum of Natural History, University of Florida, Gainesville, FL 32611-2710 USA.
Berkeley Natural History Museums, University of California, Berkeley, California, USA.
Zookeys. 2015 Apr 6(494):133-54. doi: 10.3897/zookeys.494.9352. eCollection 2015.
Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided.
生物多样性数据正以迅速增长的速度被数字化并在线提供,但当前的做法通常无法保留这些数据之间的关联,这阻碍了互操作性、溯源跟踪以及更大数据集的整合。对于与生物样本库相关的数据,生物多样性领域早就认识到,建立和保留关联的一个关键部分是在数据在实地生成时应用全球唯一标识符,并在后续流程中保留这些标识符,但在实践中这很少得到实施。既没有形成统一的标识符解决方案(如同其他一些领域那样),甚至也没有一套推荐的最佳实践和标准来支持多种标识符方案共享一致的响应。为了在更广泛的社区共识方面取得进一步进展,2014年10月,一群生物样本库和信息学专家齐聚斯德哥尔摩,讨论社区为克服当前障碍而采取的下一步措施。研讨会参与者分成四组,重点关注:当前实地生物样本库中的标识符实践;遗留生物样本库的标识符应用;应用于生物多样性数据记录(当这些记录在语义标记的出版物中发布并可用时)的标识符;以及跨越这些领域的交叉标识符解决方案。主要成果是就关键问题达成了共识,包括认识到遗留生物样本库和新生物样本库流程之间的差异、需要能够报告标识符持久化任务信息的标识符元数据概要,以及明确指示与标识符相关联的对象类型。还总结了当前标识符的特征,并提供了可用方案和实践的概述。