Fetrow J S, Palumbo M J, Berg G
Department of Biological Sciences, University at Albany, SUNY 12222, USA.
Proteins. 1997 Feb;27(2):249-71.
To study local structures in proteins, we previously developed an autoassociative artificial neural network (autoANN) and clustering tool to discover intrinsic features of macromolecular structures. The hidden unit activations computed by the trained autoANN are a convenient low-dimensional encoding of the local protein backbone structure. Clustering these activation vectors results in a unique classification of protein local structural features called Structural Building Blocks (SBBs). Here we describe application of this method to a larger database of proteins, verification of the applicability of this method to structure classification, and subsequent analysis of amino acid frequencies and several commonly occurring patterns of SBBs. The SBB classification method has several interesting properties: 1) it identifies the regular secondary structures, alpha helix and beta strand; 2) it consistently identifies other local structure features (e.g., helix caps and strand caps); 3) strong amino acid preferences are revealed at some positions in some SBBs; and 4) distinct patterns of SBBs occur in the "random coil" regions of proteins. Analysis of these patterns identifies interesting structural motifs in the protein backbone structure, indicating that SBBs can be used as "building blocks" in the analysis of protein structure. This type of pattern analysis should increase our understanding of the relationship between protein sequence and local structure, especially in the prediction of protein structures.
为了研究蛋白质中的局部结构,我们之前开发了一种自联想人工神经网络(autoANN)和聚类工具,以发现大分子结构的内在特征。经过训练的autoANN计算出的隐藏单元激活值是蛋白质局部主链结构的一种便捷的低维编码。对这些激活向量进行聚类会得到一种独特的蛋白质局部结构特征分类,称为结构构建块(SBBs)。在此,我们描述了该方法在更大的蛋白质数据库中的应用、该方法对结构分类适用性的验证,以及随后对氨基酸频率和几种常见SBB模式的分析。SBB分类方法具有几个有趣的特性:1)它能识别规则的二级结构,即α螺旋和β链;2)它能一致地识别其他局部结构特征(如螺旋帽和链帽);3)在某些SBB的某些位置揭示了强烈的氨基酸偏好;4)不同的SBB模式出现在蛋白质的“无规卷曲”区域。对这些模式的分析在蛋白质主链结构中识别出有趣的结构基序,表明SBB可作为蛋白质结构分析中的“构建块”。这种模式分析类型应能增进我们对蛋白质序列与局部结构之间关系的理解,尤其是在蛋白质结构预测方面。