Klingler T M, Brutlag D L
Department of Biochemistry, Stanford University School of Medicine, California 94305-5307.
Protein Sci. 1994 Oct;3(10):1847-57. doi: 10.1002/pro.5560031024.
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to alpha-helical and beta-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely alpha-helices and beta-sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.
我们基于氨基酸对之间的相关性,开发了一种用于蛋白质序列中结构和功能基序的新表示方法,并将其应用于α螺旋和β折叠序列。传统上,现有的用于表示和分析蛋白质序列的概率方法假定证据具有条件独立性。换句话说,假定氨基酸之间互不影响。然而,对蛋白质结构的分析反复证明了氨基酸之间的相互作用在赋予结构和功能方面的重要性。使用贝叶斯网络,我们不仅能够对蛋白质序列中不同位置的氨基酸分布进行建模,还能对这些位置上氨基酸之间的关系进行建模。我们还开发了一个自动化程序,用于使用标准统计测试和验证技术发现序列相关性。在本文中,我们在二级结构基序(即α螺旋和β折叠)的序列上测试了这个程序。在每种情况下,我们的程序发现的相关性都与结构中氨基酸之间已知的物理和化学相互作用高度吻合。此外,我们表明,使用不同的氨基酸化学字母表,我们基于构建字母表时所使用的相同化学原理发现了结构关系。蛋白质基序中三维特征的这种新表示方法,例如那些由序列上的结构或功能限制产生的特征,可用于改进包括模式分析和数据库搜索在内的序列分析工具。