Weiss O, Herzel H
Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstr. 43, Berlin, D-10115, Germany.
J Theor Biol. 1998 Feb 21;190(4):341-53. doi: 10.1006/jtbi.1997.0560.
Correlation functions in large sets of non-homologous protein sequences are analysed. Finite size corrections are applied and fluctuations are estimated. As symbol sequences have to be mapped to sequences of numbers to calculate correlation functions, several property codes are tested as such mappings. We found hydrophobicity autocorrelation functions to be strongly oscillating. Another strong signal is the monotonously decaying alpha-helix propensity autocorrelation function. Furthermore, we detected signals corresponding to an alteration of positively and negatively charged residues at a distance of 3-4 amino acids. To look beyond the property codes gained by the methods of physical chemistry, mappings yielding a strong correlation signal are sought for using a Monte Carlo simulation. The mappings leading to strong signals are found to be related to hydrophobicity of alpha-helix propensity. A cluster analysis of the top scoring mappings leads to two novel property codes. These two property codes are gained from sequence data only. They turn out to be similar to known property codes for hydrophobicity or polarity.
分析了大量非同源蛋白质序列中的相关函数。应用了有限尺寸校正并估计了波动。由于符号序列必须映射到数字序列以计算相关函数,因此测试了几种属性代码作为此类映射。我们发现疏水性自相关函数强烈振荡。另一个强烈信号是单调衰减的α-螺旋倾向自相关函数。此外,我们检测到在3-4个氨基酸距离处正负电荷残基交替对应的信号。为了超越通过物理化学方法获得的属性代码,使用蒙特卡罗模拟寻找产生强相关信号的映射。发现导致强信号的映射与α-螺旋倾向的疏水性有关。对得分最高的映射进行聚类分析得到两个新的属性代码。这两个属性代码仅从序列数据中获得。结果发现它们与已知的疏水性或极性属性代码相似。