Harjes Janno, Link Anton, Weibulat Tanja, Triebel Dagmar, Rambold Gerhard
University of Bayreuth, Universitätsstraße 30, 95440 Bayreuth, Germany.
Staatliche Naturwissenschaftliche Sammlungen Bayerns, Menzinger Straße 67, 80638 München, Germany.
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa059.
Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of 'FAIR++' digital objects is introduced.
研究设置的可重复性以及基础数据的研究结果的可再现性是科学中的主要要求。到目前为止,环境科学中缺乏用于描述研究结构逻辑的抽象模型,数据管理工具也不足。实现可重复性和可再现性的关键是使用超越数据文件共享的复杂数据管理解决方案。特别是,这意味着要在工作流程中维护连贯的数据。设计数据涉及来自操作基本领域的元素,即转换、测量和事务。针对从野外到实验室活动的每个连续工作流程段,指定操作设计元素和方法信息。操作设计元素值、操作值和对象之间的严格关联至关重要。为了在连续的工作流程段中实现相应对象的连贯性,必须分配唯一标识符并指定它们之间的关系。这里提出的抽象模型解决了这些方面的问题,软件DiversityDescriptions(DWB-DD)有助于管理如此连接的数字数据对象和结构。DWB-DD允许对操作设计元素进行个性化指定并将其与对象链接。给出了两个工作流程设计用例,一个用于DNA条形码分析,另一个用于真菌分离株的培养。要发布这些结构化数据,标准模式映射和数字对象的XML提供至关重要。适用于此映射的模式包括生态标记语言、收集对象的元组学数据模式和结构化描述数据标准。使用DWB-DD的数据管道包括模式之间的映射和转换以及根据开放档案信息系统标准进行数据发布和存档的功能。该设置允许研究设置的可重复性、研究结果的可再现性,并支持工作组从研究开始就对其数据进行结构化和维护。引入了“FAIR++”数字对象理论。