Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil.
Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia.
Viruses. 2023 Feb 13;15(2):519. doi: 10.3390/v15020519.
Profile hidden Markov models (HMMs) are a powerful way of modeling biological sequence diversity and constitute a very sensitive approach to detecting divergent sequences. Here, we report the development of protocols for the rational design of profile HMMs. These methods were implemented on TABAJARA, a program that can be used to either detect all biological sequences of a group or discriminate specific groups of sequences. By calculating position-specific information scores along a multiple sequence alignment, TABAJARA automatically identifies the most informative sequence motifs and uses them to construct profile HMMs. As a proof-of-principle, we applied TABAJARA to generate profile HMMs for the detection and classification of two viral groups presenting different evolutionary rates: bacteriophages of the family and viruses of the genus. We obtained conserved models for the generic detection of any or sequence, and profile HMMs that can specifically discriminate subfamilies or species. In another application, we constructed Cas1 endonuclease-derived profile HMMs that can discriminate CRISPRs and casposons, two evolutionarily related transposable elements. We believe that the protocols described here, and implemented on TABAJARA, constitute a generic toolbox for generating profile HMMs for the highly sensitive and specific detection of sequence classes.
隐马尔可夫模型(HMMs)是一种对生物序列多样性建模的强大方法,也是一种非常敏感的检测差异序列的方法。在这里,我们报告了用于合理设计 HMM 模型的协议的开发。这些方法是在 TABAJARA 上实现的,TABAJARA 可以用于检测一组中的所有生物序列或区分特定的序列组。通过沿多序列比对计算位置特异性信息评分,TABAJARA 自动识别最具信息量的序列基序,并使用它们来构建 HMM 模型。作为原理验证,我们应用 TABAJARA 为具有不同进化率的两个病毒群(科和属)的检测和分类生成 HMM 模型。我们获得了用于通用检测任何 或 序列的保守模型,以及能够特异性区分 亚科或 物种的 HMM 模型。在另一个应用中,我们构建了 Cas1 内切酶衍生的 HMM 模型,用于区分 CRISPRs 和 casposons,这两种进化上相关的转座元件。我们相信,这里描述的协议,以及在 TABAJARA 上实现的协议,构成了用于高度敏感和特异性检测序列类别的生成 HMM 模型的通用工具包。