Solis A D, Rackovsky S
Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029, USA.
Proteins. 2000 Feb 1;38(2):149-64.
In an effort to quantify loss of information in the processing of protein bioinformatic data, we examine how representations of amino acid sequence and backbone conformation affect the quantity of accessible structural information from local sequence. We propose a method to extract the maximum amount of peptide backbone structural information available in local sequence fragments, given a finite structural data set. Using methods of information theory, we develop an unbiased measure of local structural information that gauges changes in structural distributions when different representations of secondary structure and local sequence are used. We find that the manner in which backbone structure is represented affects the amount and quality of structural information that may be extracted from local sequence. Representations based on virtual bonds capture more structural information from local sequence than a three-state assignment scheme (helix/strand/loop). Furthermore, we find that amino acids show significant kinship with respect to the backbone structural information they carry, so that a collapse of the amino acid alphabet can be accomplished without severely affecting the amount of extractable information. This strategy is critical in optimizing the utility of a limited database of experimentally solved protein structures. Finally, we discuss the similarities within and differences between groups of amino acids in their roles in the local folding code and recognize specific amino acids critical in the formation of local structure.
为了量化蛋白质生物信息数据处理过程中的信息损失,我们研究了氨基酸序列和主链构象的表示方式如何影响从局部序列中获取的可及结构信息的数量。我们提出了一种方法,在给定有限结构数据集的情况下,从局部序列片段中提取最大量的肽主链结构信息。利用信息论方法,我们开发了一种局部结构信息的无偏度量,用于衡量在使用不同的二级结构和局部序列表示时结构分布的变化。我们发现,主链结构的表示方式会影响从局部序列中可提取的结构信息的数量和质量。基于虚拟键的表示方式比三态分配方案(螺旋/链/环)能从局部序列中捕获更多的结构信息。此外,我们发现氨基酸在其所携带的主链结构信息方面表现出显著的亲缘关系,因此在不严重影响可提取信息量的情况下,可以实现氨基酸字母表的简化。这种策略对于优化有限的实验解析蛋白质结构数据库的效用至关重要。最后,我们讨论了氨基酸组在局部折叠密码中的作用的异同,并识别出在局部结构形成中至关重要的特定氨基酸。