Chu Wei, Ghahramani Zoubin, Podtelezhnikov Alexei, Wild David L
Gatsby Computational Neuroscience Unit, University College London, London, UK.
IEEE/ACM Trans Comput Biol Bioinform. 2006 Apr-Jun;3(2):98-113. doi: 10.1109/TCBB.2006.17.
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in beta-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html.
在本文中,我们开发了一种用于蛋白质二级结构预测的分段半马尔可夫模型(SSMM),该模型结合了多序列比对轮廓,旨在提高预测性能。分段模型是隐马尔可夫模型的一种推广,其中一个隐藏状态生成各种长度和二级结构类型的片段。针对似然函数提出了一种新颖的参数化模型,该模型明确表示多序列比对轮廓以捕获片段构象。在基准数据集上的数值结果表明,纳入这些轮廓可带来显著改进,且泛化性能良好。通过纳入β折叠中长程相互作用的信息,该模型还能够对接触图进行推断。这是概率生成模型相对于传统蛋白质二级结构预测判别方法的一个重要优势。我们算法的网络服务器和补充材料可在http://public.kgi.edu/-wild/bsm.html获取。