Center for Informatics Sciences, Nile University, Giza, Egypt.
Biomed Res Int. 2013;2013:791051. doi: 10.1155/2013/791051. Epub 2013 Apr 24.
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
云计算为解决下一代测序(NGS)技术出现带来的基因组学数据泛滥问题提供了一个有前途的解决方案。基于“按需资源”和“即用即付”的概念,没有或有限基础设施的科学家可以访问可扩展且具有成本效益的计算资源。然而,NGS 数据的庞大尺寸导致从客户端站点到云的数据传输延迟显著,这成为使用云计算服务的瓶颈。在本文中,我们提供了一种基于流的方案来克服这个问题,其中在将 NGS 数据传输到云的同时对其进行处理。我们的方案针对广泛的 NGS 数据分析任务,其中 NGS 序列可以彼此独立地进行处理。我们还提供了 elastream 包,该包支持使用此方案的各个分析程序或工作流系统。本文中的实验表明,我们的解决方案减轻了数据传输延迟的影响,并节省了计算的时间和成本。