Suppr超能文献

OMD 策管工具包:公共组学数据集内部策管工作流程。

OMD Curation Toolkit: a workflow for in-house curation of public omics datasets.

机构信息

Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain.

Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain.

出版信息

BMC Bioinformatics. 2024 May 9;25(1):184. doi: 10.1186/s12859-024-05803-9.

Abstract

BACKGROUND

Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing.

RESULTS

Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources.

CONCLUSIONS

Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.

摘要

背景

测序技术的重大进展以及科学数据和元数据的共享使得大量公开可用的数据集得以出现。然而,尽管做出了这些努力,处理和特别是管理公共组学数据集仍然具有挑战性。虽然越来越多的举措旨在重复使用以前的结果,但这些举措存在局限性,往往导致需要进一步的内部管理和处理。

结果

在这里,我们介绍了组学数据集管理工具包(OMD 管理工具包),这是一个 python3 包,旨在在元数据和公共组学数据集的 fastq 文件的管理过程中为研究人员提供帮助和指导。该工作流程提供了一个具有多种功能(收集、控制检查、处理和集成)的标准化框架,以简化管理公共测序数据项目的艰巨任务。虽然该工具包以欧洲核苷酸档案库(ENA)为中心,但提供的大多数工具都是通用的,可以用于管理来自不同来源的数据集。

结论

因此,它为重复使用公共组学数据以前所需的内部管理提供了有价值的工具。由于其工作流程结构和功能,它可以方便地被用于开发基于测序数据的新型组学元分析的研究人员使用并从中受益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ddb/11084137/1bd459e5a153/12859_2024_5803_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验