Schuster-Böckler Benjamin, Schultz Jörg, Rahmann Sven
Department of Mathematics and Computer Science, Freie Universität Berlin, Germany.
BMC Bioinformatics. 2004 Jan 21;5:7. doi: 10.1186/1471-2105-5-7.
Profile Hidden Markov Models (pHMMs) are a widely used tool for protein family research. Up to now, however, there exists no method to visualize all of their central aspects graphically in an intuitively understandable way.
We present a visualization method that incorporates both emission and transition probabilities of the pHMM, thus extending sequence logos introduced by Schneider and Stephens. For each emitting state of the pHMM, we display a stack of letters. The stack height is determined by the deviation of the position's letter emission frequencies from the background frequencies. The stack width visualizes both the probability of reaching the state (the hitting probability) and the expected number of letters the state emits during a pass through the model (the state's expected contribution).A web interface offering online creation of HMM Logos and the corresponding source code can be found at the Logos web server of the Max Planck Institute for Molecular Genetics http://logos.molgen.mpg.de.
We demonstrate that HMM Logos can be a useful tool for the biologist: We use them to highlight differences between two homologous subfamilies of GTPases, Rab and Ras, and we show that they are able to indicate structural elements of Ras.
轮廓隐马尔可夫模型(pHMMs)是蛋白质家族研究中广泛使用的工具。然而,到目前为止,还没有一种方法能够以直观易懂的方式将其所有核心方面以图形方式可视化。
我们提出了一种可视化方法,该方法结合了pHMM的发射概率和转移概率,从而扩展了Schneider和Stephens提出的序列标识。对于pHMM的每个发射状态,我们展示一叠字母。堆叠高度由该位置字母发射频率与背景频率的偏差决定。堆叠宽度同时可视化到达该状态的概率(命中概率)以及在一次模型遍历中该状态发射的预期字母数(该状态的预期贡献)。可在马克斯·普朗克分子遗传学研究所的标识网络服务器http://logos.molgen.mpg.de上找到提供在线创建HMM标识及相应源代码的网络界面。
我们证明HMM标识对生物学家可能是一种有用的工具:我们用它们来突出GTPases的两个同源亚家族Rab和Ras之间的差异,并且我们表明它们能够指示Ras的结构元件。