COPO - 管理生物多样性的样本元数据:来自达尔文生命之树项目的考量

COPO - Managing sample metadata for biodiversity: considerations from the Darwin Tree of Life project.

作者信息

Shaw Felix, Minotto Alice, McTaggart Seanna, Providence Aaliyah, Harrison Peter, Paupério Joana, Rajan Jeena, Burgin Josephine, Cochrane Guy, Kilias Estelle, Lawniczak Mara K N, Davey Robert

机构信息

Earlham Institute, Norwich, Norfolk, NR4 7UH, UK.

EMBL European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK.

出版信息

Wellcome Open Res. 2024 Jun 10;7:279. doi: 10.12688/wellcomeopenres.18499.2. eCollection 2022.

Abstract

Large-scale reference genome sequencing projects for all of biodiversity are underway and common standards have been in place for some years to enable the understanding and sharing of sequence data. However, the metadata that describes the collection, processing and management of samples, and link to the associated sequencing and genome data, are not yet adequately developed and standardised for these projects. At the time of writing, the Darwin Tree of Life (DToL) Project is over two years into its ten-year ambition to sequence all described eukaryotic species in Britain and Ireland. We have sought consensus from a wide range of scientists across taxonomic domains to determine the minimal set of metadata that we collectively deem as critically important to accompany each sequenced specimen. These metadata are made available throughout the subsequent laboratory processes, and once collected, need to be adequately managed to fulfil the requirements of good data management practice. Due to the size and scale of management required, software tools are needed. These tools need to implement rigorous development pathways and change management procedures to ensure that effective research data management of key project and sample metadata is maintained. Tracking of sample properties through the sequencing process is handled by Lab Information Management Systems (LIMS), so publication of the sequenced data is achieved via technical integration of LIMS and data management tools. Discussions with community members on how metadata standards need to be managed within large-scale programmes is a priority in the planning process. Here we report on the standards we developed with respect to a robust and reusable mechanism of metadata collection, in the hopes that other projects forthcoming or underway will adopt these practices for metadata.

摘要

针对所有生物多样性的大规模参考基因组测序项目正在进行中,并且通用标准已经实施了数年,以促进序列数据的理解和共享。然而,描述样本的采集、处理和管理,并与相关测序和基因组数据建立关联的元数据,在这些项目中尚未得到充分开发和标准化。在撰写本文时,达尔文生命之树(DToL)项目已经开展了两年多,其目标是在十年内对英国和爱尔兰所有已描述的真核生物物种进行测序。我们已寻求各分类领域的众多科学家达成共识,以确定我们共同认为对每个测序样本至关重要的最少元数据集。这些元数据在后续的实验室流程中均可获取,并且一旦收集,就需要进行妥善管理,以满足良好数据管理实践的要求。由于所需管理的规模和范围较大,因此需要软件工具。这些工具需要实施严格的开发路径和变更管理程序,以确保对关键项目和样本元数据进行有效的研究数据管理。样本属性在测序过程中的跟踪由实验室信息管理系统(LIMS)处理,因此测序数据的发布是通过LIMS与数据管理工具的技术集成来实现的。在规划过程中,与社区成员讨论如何在大规模项目中管理元数据标准是一项优先事项。在此,我们报告我们针对强大且可重复使用的元数据收集机制所制定的标准,希望其他即将开展或正在进行的项目将采用这些元数据管理做法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d102/11349316/5e3141aa8af5/wellcomeopenres-7-24733-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索