Biji Christopher Leela, Madhu Manu K, Vishnu Vineetha, K Satheesh Kumar, Nair Achuthsankar S
Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram.
School of Computer Science, Mahathma Gandhi University, Kottayam.
Bioinformation. 2015 May 28;11(5):267-71. doi: 10.6026/97320630011267. eCollection 2015.
The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk"foot print" of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.
The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/.
在后基因组时代,大数据存储是一项挑战。因此,需要高性能计算解决方案来管理大型基因组数据。所以,描述一种使用消息传递库的并行计算方法以在集群中分配不同压缩阶段是很有意义的。基因组压缩有助于减少大量序列数据在磁盘上的“占用空间”。这为更高效的存档提供了计算基础设施支持。在本报告中,该方法通过分层抽样在21个真核生物基因组中显示出实用性。该方法平均可将磁盘空间减少6倍,压缩时间比COMRAD快三倍。
源代码用C语言编写,使用消息传递库,可在https://sourceforge.net/projects/comradmpi/files/COMRADMPI/获取。