Rahman R S, Rackovsky S
Department of Physics and Astronomy, University of Rochester, NY, USA.
Biophys J. 1995 Apr;68(4):1531-9. doi: 10.1016/S0006-3495(95)80325-5.
We investigated protein sequence/structure correlation by constructing a space of protein sequences, based on methods developed previously for constructing a space of protein structures. The space is constructed by using a representation of the amino acids as vectors of 10 property factors that encode almost all of their physical properties. Each sequence is represented by a distribution of overlapping sequence fragments. A distance between any two sequences can be calculated. By attaching a weight to each factor, intersequence distances can be varied. We optimize the correlation between corresponding distances in the sequence and structure spaces. The optimal correlation between the sequence and structure spaces is significantly better than that which results from correlating randomly generated sequences, having the overall composition of the data base, with the structure space. However, sets of randomly generated sequences, each of which approximates the composition of the real sequence it replaces, produce correlations with the structure space that are as good as that observed for the actual protein sequences. A connection is proposed with previous studies of the protein folding code. It is shown that the most important property factors for the correlation of the sequence and structure spaces are related to helix/bend preference, side chain bulk, and beta-structure preference.
我们基于先前用于构建蛋白质结构空间的方法,通过构建蛋白质序列空间来研究蛋白质序列/结构的相关性。该空间是通过将氨基酸表示为10个属性因子的向量来构建的,这些因子编码了几乎所有的物理属性。每个序列由重叠序列片段的分布表示。任意两个序列之间的距离都可以计算出来。通过给每个因子赋予权重,可以改变序列间的距离。我们优化了序列空间和结构空间中相应距离之间的相关性。序列空间和结构空间之间的最佳相关性明显优于将具有数据库总体组成的随机生成序列与结构空间进行关联所得到的相关性。然而,每组随机生成的序列,其中每个序列都近似于它所替代的真实序列的组成,与结构空间产生的相关性与实际蛋白质序列所观察到的相关性一样好。本文提出了与先前蛋白质折叠密码研究的联系。结果表明,序列空间和结构空间相关性中最重要的属性因子与螺旋/弯曲偏好、侧链体积和β结构偏好有关。