Stanford Healthcare Innovation Lab, Stanford University, California, United States of America.
Stanford Center for Genomics and Personalized Medicine, Stanford University, California, United States of America.
PLoS Comput Biol. 2021 May 12;17(5):e1008977. doi: 10.1371/journal.pcbi.1008977. eCollection 2021 May.
Genomic data analysis across multiple cloud platforms is an ongoing challenge, especially when large amounts of data are involved. Here, we present Swarm, a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL. Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.
在多个云平台上进行基因组数据分析是一个持续存在的挑战,特别是当涉及大量数据时。在这里,我们提出了 Swarm,这是一个联邦计算框架,它促进了最小的数据移动,并促进了存储在各种云平台上的基因组数据集之间的交流。我们通过在谷歌云平台 (GCP) 的 BigQuery、亚马逊网络服务 (AWS) 的 Athena、Apache Presto 和 MySQL 中对基因组变体进行常见查询来展示其效用。与单云平台相比,Swarm 框架显著降低了计算成本、运行时延迟以及安全漏洞和隐私侵犯的风险。