Camproux A C, Tufféry P
Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France.
Biochim Biophys Acta. 2005 Aug 5;1724(3):394-403. doi: 10.1016/j.bbagen.2005.05.019.
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
理解和预测蛋白质结构取决于用于表示它们的模型的复杂性和准确性。我们最近建立了一个隐马尔可夫模型,以将蛋白质三维构象最佳地压缩为结构字母表的一维字母序列。这样的模型同时学习描述局部构象的代表性结构字母的形状及其连接逻辑,即字母之间的转移矩阵。在此,我们更进一步,报告一些证据表明这种蛋白质局部结构模型也捕捉到了一些准确的氨基酸特征。所有字母都有特定且不同的氨基酸分布。此外,我们表明氨基酸的词对某些字母可能有显著的倾向。前景指向从蛋白质的氨基酸序列预测描述其结构的字母序列。