Oldfield T J
Accelrys Inc., 10188 Telesis Court, Suite 100, San Diego, CA 92121, USA.
Acta Crystallogr D Biol Crystallogr. 2007 Apr;63(Pt 4):514-25. doi: 10.1107/S0907444907000844. Epub 2007 Mar 16.
Coordinate superposition of proteins provides a structural basis to protein similarity and therefore complements the technique of sequence alignment. Methods that carry out structure alignment are faced with the problem of the large number of trials necessary to determine the optimal alignment solution. This article presents a method of carrying out rapid (subsecond) protein-structure alignment between pairs of proteins based on a maximal C(alpha)-atom superposition. The algorithm can return alignments of 12 or more residues in length as multiple non-overlapping solutions of alignment between a pair of proteins which are independent of the fold connectivity and secondary-structure content. The algorithm is equally effective for all protein fold types and can align proteins containing no secondary-structure elements such as is the case when searching for common turn structures in proteins. It has high sensitivity and returns the set of true positive results before any false positives as judged by SCOP classification. It can find alignments between topologically different folds and returns information about sequence alignment based on structure alignment. Additionally, this algorithm has been extended to carry out multiple structure alignment to determine common structures within groups of proteins, including the nondegenerate set of proteins in the PDB. The algorithm has been implemented within the program CAALIGN and this article presents results from pairwise structure alignment, multiple structure alignment and the generation of common structure fragments found within the PDB using multiple structure alignment.
蛋白质的坐标叠加为蛋白质相似性提供了结构基础,因此补充了序列比对技术。进行结构比对的方法面临着确定最优比对解决方案所需进行大量试验的问题。本文提出了一种基于最大Cα原子叠加在成对蛋白质之间进行快速(亚秒级)蛋白质结构比对的方法。该算法可以返回长度为12个或更多残基的比对结果,作为一对蛋白质之间比对的多个非重叠解决方案,这些解决方案与折叠连通性和二级结构内容无关。该算法对所有蛋白质折叠类型均同样有效,并且可以比对不包含二级结构元件的蛋白质,例如在搜索蛋白质中的常见转角结构时的情况。根据SCOP分类判断,它具有高灵敏度,并且在出现任何假阳性结果之前返回真阳性结果集。它可以找到拓扑不同折叠之间的比对,并基于结构比对返回有关序列比对的信息。此外,该算法已扩展到进行多结构比对,以确定蛋白质组内的共同结构,包括PDB中的非简并蛋白质集。该算法已在程序CAALIGN中实现,本文展示了成对结构比对、多结构比对以及使用多结构比对在PDB中发现的共同结构片段生成的结果。