Suppr超能文献

在并行计算平台上使用COMRAD对大型基因组数据集进行压缩

Compression of Large genomic datasets using COMRAD on Parallel Computing Platform.

作者信息

Biji Christopher Leela, Madhu Manu K, Vishnu Vineetha, K Satheesh Kumar, Nair Achuthsankar S

机构信息

Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram.

School of Computer Science, Mahathma Gandhi University, Kottayam.

出版信息

Bioinformation. 2015 May 28;11(5):267-71. doi: 10.6026/97320630011267. eCollection 2015.

Abstract

UNLABELLED

The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk"foot print" of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.

AVAILABILITY

The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/.

摘要

未标注

在后基因组时代,大数据存储是一项挑战。因此,需要高性能计算解决方案来管理大型基因组数据。所以,描述一种使用消息传递库的并行计算方法以在集群中分配不同压缩阶段是很有意义的。基因组压缩有助于减少大量序列数据在磁盘上的“占用空间”。这为更高效的存档提供了计算基础设施支持。在本报告中,该方法通过分层抽样在21个真核生物基因组中显示出实用性。该方法平均可将磁盘空间减少6倍,压缩时间比COMRAD快三倍。

可用性

源代码用C语言编写,使用消息传递库,可在https://sourceforge.net/projects/comradmpi/files/COMRADMPI/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ad5/4464544/9acc8d7ecc73/97320630011267F1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验