Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany.
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7.
HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins.
We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite .
The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.
HH-suite 是一个广泛使用的开源软件套件,用于敏感序列相似性搜索和蛋白质折叠识别。它基于轮廓隐马尔可夫模型(HMM)的成对比对,代表同源蛋白质的多重序列比对。
我们开发了一种用于轮廓 HMM 比对的单指令多数据(SIMD)矢量化实现,并引入了各种其他加速方法。这些加速方法使 HHsearch 的搜索速度提高了 4 倍,HHblits 的搜索速度提高了 2 倍,超过了之前的版本 2.0.16。HHblits3 比 PSI-BLAST 快约 10 倍,比 HMMER3 快约 20 倍。使用 OpenMP 和消息传递接口(MPI)可以在核心和集群服务器上并行执行 HHsearch 和 HHblits 搜索,以处理许多查询轮廓 HMM 的作业。免费的、开源的、GPLv3 许可证的软件可在 https://github.com/soedinglab/hh-suite 获得。
HHsearch 和 HHblits 的附加功能和更快的速度应该有助于它们在大规模蛋白质结构和功能预测中的应用,例如在宏基因组学和基因组学项目中。