Regional Computing Center (RRZK), University of Cologne, Cologne, 50931, Germany.
Department of Translational Genomics, Center of Integrated Oncology Cologne-Bonn, Medical Faculty, University of Cologne, Cologne, 50931, Germany.
BMC Bioinformatics. 2019 Jan 15;20(1):29. doi: 10.1186/s12859-018-2576-5.
The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date.
We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information.
Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.
下一代测序(NGS)方法产生的大量数据给数据安全、存储和元数据管理带来了各种挑战。尽管有广泛的数据分析管道,但这些挑战至今仍未得到解决。
我们描述了开源元数据管理系统 iRODS(集成面向规则的数据系统)与高性能计算环境中的癌症基因组分析管道的集成。该系统允许自定义元数据属性以及细粒度的保护规则,并通过元数据输入的用户友好前端进行增强。这导致在考虑数据安全性、集中存储和统一元数据信息的情况下,实现了一个健壮、高效的端到端工作流程。
将 iRODS 与 NGS 数据分析管道集成是解决 NGS 环境中数据安全、存储和元数据管理挑战的一种合适方法。