Meyerguz Leonid, Kleinberg Jon, Elber Ron
Department of Computer Science, Cornell University, Ithaca, NY 14853, USA.
Proc Natl Acad Sci U S A. 2007 Jul 10;104(28):11627-32. doi: 10.1073/pnas.0701393104. Epub 2007 Jun 27.
Sequence-structure relationships in proteins are highly asymmetric because many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions, we compute a directed graph of sequences and structures of proteins, which is based on 2,060 experimentally determined protein shapes from the Protein Data Bank. The directed graph is highly connected at native energies with "sinks" that attract many sequences from other folds. The sinks are rich in beta-sheets. The number of sequences that transition between folds is significantly smaller than the number of sequences retained by their fold. The sequence flow into a particular protein shape from other proteins correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure.
蛋白质中的序列 - 结构关系高度不对称,因为许多序列折叠成相对较少的结构。折叠成特定蛋白质结构的序列数量是多少?是否有可能通过点突变在稳定的蛋白质折叠之间切换?为了解决这些问题,我们基于蛋白质数据库中2060个实验确定的蛋白质形状,计算了一个蛋白质序列和结构的有向图。该有向图在天然能量下高度连通,有吸引许多来自其他折叠的序列的“汇”。这些汇富含β - 折叠。在折叠之间转换的序列数量明显小于其折叠保留的序列数量。从其他蛋白质流入特定蛋白质形状的序列流与经验确定的基因组中与该形状匹配的序列数量相关。图的强连通分量的属性与蛋白质长度和二级结构相关。