Santana-Quintero Luis, Dingerdissen Hayley, Thierry-Mieg Jean, Mazumder Raja, Simonyan Vahan
Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, Maryland, United States of America.
Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, Maryland, United States of America; Department of Biochemistry and Molecular Biology, George Washington University Medical Center, Washington, DC, United States of America.
PLoS One. 2014 Jun 11;9(6):e99033. doi: 10.1371/journal.pone.0099033. eCollection 2014.
Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations.
由于下一代测序数据的规模,序列比对的计算挑战巨大。在生物信息学流程中,不精确比对可能占用高达90%的总CPU时间。高性能集成虚拟环境(HIVE)是一种针对超大数据的存储和分析进行优化的基于云的环境,它提供了一种算法解决方案:HIVE-六边形DNA序列比对器。HIVE-六边形采用新颖的方法来利用序列空间以及CPU、随机存取存储器和输入/输出(I/O)架构的特性,以快速计算出准确的比对结果。HIVE-六边形的关键组件包括序列的去冗余和排序;线性化动态规划矩阵的浮动对角线;以及考虑交叉相似性以最小化计算量。