Suppr超能文献

用于二级结构预测的最优隐马尔可夫模型分析。

Analysis of an optimal hidden Markov model for secondary structure prediction.

作者信息

Martin Juliette, Gibrat Jean-François, Rodolphe François

机构信息

INSERM U726, Equipe de Bioinformatique Génomique et Moléculaire Université Denis Diderot Paris 7, 2 place jussieu, 75251 Paris Cedex 05, France.

出版信息

BMC Struct Biol. 2006 Dec 13;6:25. doi: 10.1186/1472-6807-6-25.

Abstract

BACKGROUND

Secondary structure prediction is a useful first step toward 3D structure prediction. A number of successful secondary structure prediction methods use neural networks, but unfortunately, neural networks are not intuitively interpretable. On the contrary, hidden Markov models are graphical interpretable models. Moreover, they have been successfully used in many bioinformatic applications. Because they offer a strong statistical background and allow model interpretation, we propose a method based on hidden Markov models.

RESULTS

Our HMM is designed without prior knowledge. It is chosen within a collection of models of increasing size, using statistical and accuracy criteria. The resulting model has 36 hidden states: 15 that model alpha-helices, 12 that model coil and 9 that model beta-strands. Connections between hidden states and state emission probabilities reflect the organization of protein structures into secondary structure segments. We start by analyzing the model features and see how it offers a new vision of local structures. We then use it for secondary structure prediction. Our model appears to be very efficient on single sequences, with a Q3 score of 68.8%, more than one point above PSIPRED prediction on single sequences. A straightforward extension of the method allows the use of multiple sequence alignments, rising the Q3 score to 75.5%.

CONCLUSION

The hidden Markov model presented here achieves valuable prediction results using only a limited number of parameters. It provides an interpretable framework for protein secondary structure architecture. Furthermore, it can be used as a tool for generating protein sequences with a given secondary structure content.

摘要

背景

二级结构预测是迈向三维结构预测的有用的第一步。许多成功的二级结构预测方法使用神经网络,但不幸的是,神经网络缺乏直观的可解释性。相反,隐马尔可夫模型是具有图形可解释性的模型。此外,它们已成功应用于许多生物信息学应用中。由于它们提供了强大的统计背景并允许对模型进行解释,我们提出了一种基于隐马尔可夫模型的方法。

结果

我们的隐马尔可夫模型是在没有先验知识的情况下设计的。它是在一系列规模不断增大的模型中,根据统计和准确性标准进行选择的。最终得到的模型有36个隐藏状态:15个用于模拟α螺旋,12个用于模拟卷曲,9个用于模拟β链。隐藏状态之间的连接以及状态发射概率反映了蛋白质结构组织成二级结构片段的情况。我们首先分析模型特征,看看它如何为局部结构提供新的视角。然后我们将其用于二级结构预测。我们的模型在单序列上似乎非常有效,Q3得分达到68.8%,比单序列上的PSIPRED预测高出一个多百分点。该方法的直接扩展允许使用多序列比对,使Q3得分提高到75.5%。

结论

本文提出的隐马尔可夫模型仅使用有限数量的参数就取得了有价值的预测结果。它为蛋白质二级结构架构提供了一个可解释的框架。此外,它还可以用作生成具有给定二级结构含量的蛋白质序列的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98f2/1769381/17e7a1feb54f/1472-6807-6-25-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验