Koehl Patrice, Levitt Michael
Department of Structural Biology, Fairchild Building, D109, Stanford University, Stanford, CA 94305, USA.
Proc Natl Acad Sci U S A. 2002 Feb 5;99(3):1280-5. doi: 10.1073/pnas.032405199. Epub 2002 Jan 22.
We describe a new approach to explore and quantify the sequence space associated with a given protein structure. A set of sequences are optimized for a given target structure, using all-atom models and a physical energy function. Specificity of the sequence for its target is ensured by using the random energy model, which keeps the amino acid composition of the sequence constant. The designed sequences provide a multiple sequence alignment that describes the sequence space compatible with the structure of interest; here the size of this space is estimated by using an information entropy measure. In parallel, multiple alignments of naturally occurring sequences can be derived by using either sequence or structure alignments. We compared these 3 independent multiple sequence alignments for 10 different proteins, ranging in size from 56 to 310 residues. We observed that the subset of the sequence space derived by using our design procedure is similar in size to the sequence spaces observed in nature. These results suggest that the volume of sequence space compatible with a given protein fold is defined by the length of the protein as well as by the topology (i.e., geometry of the polypeptide chain) and the stability (i.e., free energy of denaturation) of the fold.
我们描述了一种探索和量化与给定蛋白质结构相关的序列空间的新方法。使用全原子模型和物理能量函数,针对给定的目标结构优化一组序列。通过使用随机能量模型确保序列对其目标的特异性,该模型可保持序列的氨基酸组成不变。设计的序列提供了一个多序列比对,该比对描述了与感兴趣结构兼容的序列空间;这里通过使用信息熵度量来估计这个空间的大小。同时,可以通过使用序列比对或结构比对来推导天然存在序列的多序列比对。我们比较了10种不同蛋白质(大小从56个残基到310个残基不等)的这3种独立的多序列比对。我们观察到,通过我们的设计程序获得的序列空间子集在大小上与自然界中观察到的序列空间相似。这些结果表明,与给定蛋白质折叠兼容的序列空间体积由蛋白质的长度以及折叠的拓扑结构(即多肽链的几何形状)和稳定性(即变性自由能)决定。