Gan Hin Hark, Perlow Rebecca A, Roy Sharmili, Ko Joy, Wu Min, Huang Jing, Yan Shixiang, Nicoletta Angelo, Vafai Jonathan, Sun Ding, Wang Lihua, Noah Joyce E, Pasquali Samuela, Schlick Tamar
Department of Chemistry, New York University Medical School, New York, NY 10012, USA.
Biophys J. 2002 Nov;83(5):2781-91. doi: 10.1016/s0006-3495(02)75287-9.
Current analyses of protein sequence/structure relationships have focused on expected similarity relationships for structurally similar proteins. To survey and explore the basis of these relationships, we present a general sequence/structure map that covers all combinations of similarity/dissimilarity relationships and provide novel energetic analyses of these relationships. To aid our analysis, we divide protein relationships into four categories: expected/unexpected similarity (S and S(?)) and expected/unexpected dissimilarity (D and D(?)) relationships. In the expected similarity region S, we show that trends in the sequence/structure relation can be derived based on the requirement of protein stability and the energetics of sequence and structural changes. Specifically, we derive a formula relating sequence and structural deviations to a parameter characterizing protein stiffness; the formula fits the data reasonably well. We suggest that the absence of data in region S(?) (high structural but low sequence similarity) is due to unfavorable energetics. In contrast to region S, region D(?) (high sequence but low structural similarity) is well-represented by proteins that can accommodate large structural changes. Our analyses indicate that there are several categories of similarity relationships and that protein energetics provide a basis for understanding these relationships.
当前对蛋白质序列/结构关系的分析主要集中在结构相似蛋白质的预期相似性关系上。为了调查和探索这些关系的基础,我们展示了一个涵盖相似性/非相似性关系所有组合的通用序列/结构图谱,并对这些关系进行了新颖的能量分析。为了辅助我们的分析,我们将蛋白质关系分为四类:预期/非预期相似性(S和S(?))以及预期/非预期非相似性(D和D(?))关系。在预期相似性区域S中,我们表明序列/结构关系的趋势可以基于蛋白质稳定性的要求以及序列和结构变化的能量学推导得出。具体而言,我们推导了一个将序列和结构偏差与表征蛋白质刚性的参数相关联的公式;该公式与数据拟合得相当好。我们认为区域S(?)(高结构相似性但低序列相似性)缺乏数据是由于能量学不利。与区域S相反,区域D(?)(高序列相似性但低结构相似性)由能够适应大结构变化的蛋白质很好地代表。我们的分析表明存在几类相似性关系,并且蛋白质能量学为理解这些关系提供了基础。