Sun Yanni, Buhler Jeremy
Department of Computer Science and Engineering, Washington University, St Louis, MO 63130, USA.
Bioinformatics. 2007 Jan 15;23(2):e36-43. doi: 10.1093/bioinformatics/btl323.
Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome.
We use PROSITE-like patterns as a filter to speed up the comparison between protein sequence and profile HMM. A set of patterns is designed starting from the HMM, and only sequences matching one of these patterns are compared to the HMM by full dynamic programming. We give an algorithm to design patterns with maximal sensitivity subject to a bound on the false positive rate. Experiments show that our patterns typically retain at least 90% of the sensitivity of the source HMM while accelerating search by an order of magnitude.
Contact the first author at the address below.
轮廓隐马尔可夫模型(Profile HMMs)是用于对蛋白质中保守基序进行建模的强大工具。这些模型被搜索工具广泛用于根据结构域架构将新的蛋白质序列分类到家族中。然而,已知基序和新的蛋白质组序列数据的激增给搜索带来了计算挑战,注释一个生物体的蛋白质组需要数天的CPU时间。
我们使用类似PROSITE的模式作为过滤器来加速蛋白质序列与轮廓隐马尔可夫模型之间的比较。从隐马尔可夫模型开始设计一组模式,只有与这些模式之一匹配的序列才通过完全动态规划与隐马尔可夫模型进行比较。我们给出了一种在误报率有界的情况下设计具有最大灵敏度的模式的算法。实验表明,我们的模式通常保留了源隐马尔可夫模型至少90%的灵敏度,同时将搜索速度提高了一个数量级。
通过以下地址联系第一作者。