Bornberg-Bauer E
Abteilung Theoretische Bioinformatik, Deutsches Krebsforschungszentrum, Heidelberg, Germany.
Biophys J. 1997 Nov;73(5):2393-403. doi: 10.1016/S0006-3495(97)78268-7.
The figure-to-structure maps for all uniquely folding sequences of short hydrophobic polar (HP) model proteins on a square lattice is analyzed to investigate aspects considered relevant to evolution. By ranking structures by their frequencies, few very frequent and many rare structures are found. The distribution can be empirically described by a generalized Zipf's law. All structures are relatively compact, yet the most compact ones are rare. Most sequences falling to the same structure belong to "neutral nets." These graphs in sequence space are connected by point mutations and centered around prototype sequences, which tolerate the largest number (up to 55%) of neutral mutations. Profiles have been derived from these homologous sequences. Frequent structures conserve hydrophobic cores only while rare ones are sensitive to surface mutations as well. Shape space covering, i.e., the ability to transform any structure into most others with few point mutations, is very unlikely. It is concluded that many characteristic features of the sequence-to-structure map of real proteins, such as the dominance of few folds, can be explained by the simple HP model. In analogy to protein families, nets are dense and well separated in sequence space. Potential implications in better understanding the evolution of proteins and applications to improving database searches are discussed.
分析了方形晶格上短疏水极性(HP)模型蛋白质所有唯一折叠序列的图到结构映射,以研究与进化相关的方面。通过按频率对结构进行排序,发现了少数非常频繁的结构和许多罕见的结构。该分布可以用广义齐普夫定律进行经验描述。所有结构都相对紧凑,但最紧凑的结构很少见。落入相同结构的大多数序列属于“中性网络”。序列空间中的这些图通过点突变相连,并以原型序列为中心,这些原型序列能容忍最多数量(高达55%)的中性突变。已从这些同源序列中推导出台积电。频繁出现的结构仅保留疏水核心,而罕见的结构对表面突变也很敏感。形状空间覆盖,即通过少量点突变将任何结构转变为大多数其他结构的能力,是非常不可能的。得出的结论是,真实蛋白质序列到结构映射的许多特征,例如少数折叠的主导地位,可以用简单的HP模型来解释。类似于蛋白质家族,网络在序列空间中密集且分隔良好。讨论了在更好地理解蛋白质进化以及改进数据库搜索应用方面的潜在意义。