Rackovsky S
Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, 14853.
Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, 10029.
Proteins. 2015 Nov;83(11):1923-8. doi: 10.1002/prot.24916. Epub 2015 Sep 10.
We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.
我们研究了基于信息学的方法在计算蛋白质生物物理学中的效用。为此,我们使用新开发的度量函数为一个大型蛋白质数据库定义完全独立的序列和结构空间。通过研究这些空间之间的关系,我们定量地证明了基于知识的蛋白质序列与结构之间相关性的局限性。结果表明,蛋白质空间中存在明确的非线性区域,其中不同的结构映射到相似的序列(构象转换),不同的序列映射到相似的结构(远程同源性)。这些非线性现象非常普遍——我们数据库中几乎一半的蛋白质属于这两个区域中的一个或另一个。它们不是异常现象,而是氨基酸序列中结构编码的固有属性。因此,在使用生物信息学数据作为计算结构预测的基础时必须格外小心。我们还研究了这些结果对蛋白质进化的影响。