Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, Florida, United States of America.
School of Natural Resources and the Environment, University of Florida, Gainesville, Florida, United States of America.
PLoS Biol. 2019 Jan 29;17(1):e3000125. doi: 10.1371/journal.pbio.3000125. eCollection 2019 Jan.
Over the past decade, biology has undergone a data revolution in how researchers collect data and the amount of data being collected. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. Regularly updated data present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow leverages tools from software development, including version control and continuous integration, to create a modern data management system that automates the pipeline.
在过去的十年中,生物学在研究人员收集数据的方式和所收集数据的数量上经历了一场数据革命。在生物学中,一个受到有限关注的新兴挑战是管理、处理和提供持续主动收集的数据的访问权限。定期更新的数据在质量保证和控制、数据发布、存档和可重复性方面带来了独特的挑战。我们为一项长期生态研究开发了一个工作流程,该流程解决了与管理此类数据相关的许多挑战。我们通过利用现有工具来实现这一点:1)执行质量保证和控制;2)导入、重组、版本控制和存档数据;3)以确保向所有贡献者提供适当奖励的方式快速发布新数据;4)自动化数据管道中的大多数步骤,以减少研究人员所需的时间和精力。该工作流程利用软件开发工具,包括版本控制和持续集成,创建了一个自动化管道的现代数据管理系统。