Politecnico di Milano.
Computational Biology at Politecnico di Milano.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa091.
With the spreading of biological and clinical uses of next-generation sequencing (NGS) data, many laboratories and health organizations are facing the need of sharing NGS data resources and easily accessing and processing comprehensively shared genomic data; in most cases, primary and secondary data management of NGS data is done at sequencing stations, and sharing applies to processed data. Based on the previous single-instance GMQL system architecture, here we review the model, language and architectural extensions that make the GMQL centralized system innovatively open to federated computing.
A well-designed extension of a centralized system architecture to support federated data sharing and query processing. Data is federated thanks to simple data sharing instructions. Queries are assigned to execution nodes; they are translated into an intermediate representation, whose computation drives data and processing distributions. The approach allows writing federated applications according to classical styles: centralized, distributed or externalized.
The federated genomic data management system is freely available for non-commercial use as an open source project at http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/.
{arif.canakoglu, pietro.pinoli}@polimi.it.
随着下一代测序(NGS)数据的生物和临床应用的普及,许多实验室和医疗机构都面临着共享 NGS 数据资源以及轻松访问和处理全面共享基因组数据的需求;在大多数情况下,NGS 数据的主要和次要数据管理都是在测序站完成的,而共享则适用于处理后的数据。基于之前的单实例 GMQL 系统架构,我们在这里回顾了使 GMQL 集中式系统创新地开放联邦计算的模型、语言和体系结构扩展。
精心设计的集中式系统架构扩展,以支持联邦数据共享和查询处理。通过简单的数据共享指令实现数据联邦。查询被分配到执行节点;它们被翻译成中间表示,其计算驱动数据和处理分布。该方法允许根据集中式、分布式或外部化的经典样式编写联邦应用程序。
联邦基因组数据管理系统可作为开源项目在 http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/ 上免费供非商业用途使用。
{arif.canakoglu,pietro.pinoli}@polimi.it。