Suppr超能文献

RGMQL:在 R/Bioconductor 中可扩展和互操作的异构组学大数据和元数据的计算。

RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor.

机构信息

Dipartimento di Elettronica, Informazione e Bioingegneria, Via Ponzio 34/5, 20133, Milan, Italy.

出版信息

BMC Bioinformatics. 2022 Apr 7;23(1):123. doi: 10.1186/s12859-022-04648-4.

Abstract

BACKGROUND

Heterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions. Their integration and processing are crucial mostly for tertiary analysis of Next Generation Sequencing data, although suitable big data strategies still address mainly primary and secondary analysis. Hence, there is a pressing need for algorithms specifically designed to explore big omics datasets, capable of ensuring scalability and interoperability, possibly relying on high-performance computing infrastructures.

RESULTS

We propose RGMQL, a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources. RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services. Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment. But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions.

CONCLUSIONS

RGMQL is able to combine the query expressiveness and computational efficiency of GMQL with a complete processing flow in the R environment, being a fully integrated extension of the R/Bioconductor framework. Here we provide three fully reproducible example use cases of biological relevance that are particularly explanatory of its flexibility of use and interoperability with other R/Bioconductor packages. They show how RGMQL can easily scale up from local to parallel and cloud computing while it combines and analyzes heterogeneous omics data from local or remote datasets, both public and private, in a completely transparent way to the user.

摘要

背景

高通量技术越来越多地收集了异构组学数据,这些数据可能隐藏着非常重要且尚未解决的生物医学问题的答案。这些数据的整合和处理对于下一代测序数据的三级分析至关重要,尽管合适的大数据策略主要还是针对一级和二级分析。因此,迫切需要专门设计用于探索大型组学数据集的算法,这些算法能够确保可扩展性和互操作性,可能依赖于高性能计算基础设施。

结果

我们提出了 RGMQL,这是一个 R/Bioconductor 包,旨在提供一组专门的功能,用于从不同和不同本地化来源提取、组合、处理和比较组学数据集及其元数据。RGMQL 构建在 GenoMetric 查询语言(GMQL)数据管理和计算引擎之上,可以利用其开放的策展存储库及其基于云的资源,并且有可能将计算任务外包给 GMQL 远程服务。此外,它克服了 GMQL 声明式语法的限制,通过在 R/Bioconductor 环境中处理组学数据时保证了一种过程方法。但最重要的是,它与 R/Bioconductor 框架的其他包完全互操作,并在最常用的基因组数据结构和处理功能上具有可扩展性。

结论

RGMQL 能够将 GMQL 的查询表达能力和计算效率与 R 环境中的完整处理流程相结合,是 R/Bioconductor 框架的完全集成扩展。在这里,我们提供了三个具有生物学相关性的完全可重现示例用例,特别说明了其使用灵活性和与其他 R/Bioconductor 包的互操作性。它们展示了 RGMQL 如何能够轻松地从本地扩展到并行和云计算,同时以用户完全透明的方式组合和分析来自本地或远程数据集(包括公共和私人数据集)的异构组学数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c614/8991469/be3185352aef/12859_2022_4648_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验