Madej T, Gibrat J F, Bryant S H
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Proteins. 1995 Nov;23(3):356-69. doi: 10.1002/prot.340230309.
We present an analysis of 10 blind predictions prepared for a recent conference, "Critical Assessment of Techniques for Protein Structure Prediction." The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small "register shifts" of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the "cores" of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the "fold" of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution.
我们对为最近召开的“蛋白质结构预测技术的关键评估”会议准备的10个盲测预测进行了分析。这些蛋白质的序列与当时可用的结构数据库中的任何蛋白质的序列均未检测到相似性,但我们尝试通过穿线法来识别与已知结构域折叠的相似性。我们后来了解到,这10种蛋白质中有4种确实与当时已知的结构显示出显著的相似性。对于其中2种蛋白质,预测是准确的,也就是说,在穿线得分列表的顶部或接近顶部出现了相似的结构,并且穿线比对与相应的结构比对非常吻合。对于最佳预测模型,相对于最佳结构比对的平均比对误差为2.7个残基,这完全是由链或螺旋的小“配准移位”引起的。在分析中,我们试图确定导致这些成功和失败的因素。由于我们的穿线法不使用空位罚分,我们可以很容易地区分由我们先前对已知结构“核心”的定义引起的误差和由穿线势的固有局限性引起的误差。从结果来看,成功的子结构识别最关键地取决于对数据库蛋白质“折叠”的准确定义。这个定义必须正确地描绘出在蛋白质进化过程中可能和不可能保守的子结构。