Suppr超能文献

一种用于高效传输海量DNA测序数据的系统架构。

A System Architecture for Efficient Transmission of Massive DNA Sequencing Data.

作者信息

Sağiroğlu Mahmut Şamİl, Külekcİ M Oğuzhan

机构信息

1 ERLAB Technologies, ITU ARI1 Technopark , Istanbul, Turkey .

2 Informatics Institute, Istanbul Technical University , Istanbul, Turkey .

出版信息

J Comput Biol. 2017 Nov;24(11):1081-1088. doi: 10.1089/cmb.2017.0016. Epub 2017 Apr 17.

Abstract

The DNA sequencing data analysis pipelines require significant computational resources. In that sense, cloud computing infrastructures appear as a natural choice for this processing. However, the first practical difficulty in reaching the cloud computing services is the transmission of the massive DNA sequencing data from where they are produced to where they will be processed. The daily practice here begins with compressing the data in FASTQ file format, and then sending these data via fast data transmission protocols. In this study, we address the weaknesses in that daily practice and present a new system architecture that incorporates the computational resources available on the client side while dynamically adapting itself to the available bandwidth. Our proposal considers the real-life scenarios, where the bandwidth of the connection between the parties may fluctuate, and also the computing power on the client side may be of any size ranging from moderate personal computers to powerful workstations. The proposed architecture aims at utilizing both the communication bandwidth and the computing resources for satisfying the ultimate goal of reaching the results as early as possible. We present a prototype implementation of the proposed architecture, and analyze several real-life cases, which provide useful insights for the sequencing centers, especially on deciding when to use a cloud service and in what conditions.

摘要

DNA测序数据分析管道需要大量的计算资源。从这个意义上说,云计算基础设施似乎是进行这种处理的自然选择。然而,使用云计算服务面临的第一个实际困难是将海量的DNA测序数据从产生地传输到处理地。这里的日常做法是先将数据压缩成FASTQ文件格式,然后通过快速数据传输协议发送这些数据。在本研究中,我们解决了这种日常做法中的弱点,并提出了一种新的系统架构,该架构整合了客户端可用的计算资源,同时能动态适应可用带宽。我们的提议考虑了现实场景,即各方之间连接的带宽可能会波动,而且客户端的计算能力可能大小不一,从普通个人电脑到强大的工作站都有可能。所提出的架构旨在利用通信带宽和计算资源,以尽早达成结果这一最终目标。我们展示了所提架构的原型实现,并分析了几个实际案例,这些案例为测序中心提供了有用的见解,特别是在决定何时以及在何种条件下使用云服务方面。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验