Qu Y, Guo J T, Olman V, Xu Y
Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA.
Pac Symp Biocomput. 2004:459-70. doi: 10.1142/9789812704856_0043.
Residual dipolar coupling (RDC) represents one of the most exciting emerging NMR techniques for studying protein structures. However, solving a protein structure using RDC data alone is a highly challenging problem as it often requires that the starting structure model be close to the actual structure of a protein, for the structure calculation procedure to be effective. We report in this paper a computer program, RDC-PROSPECT, for identification of a structural homolog or analog of a target protein in PDB, which best matches the 15N-1H RDC data of the protein recorded in a single ordering medium. The identified structural homolog/analog can then be used as a starting model for RDC-based structure calculation. Since RDC-PROSPECT uses only RDC data and predicted secondary structure information, its performance is virtually independent of sequence similarity between a target protein and its structural homolog/analog, making it applicable to protein targets out of the scope of current protein threading techniques. We have tested RDC-PROSPECT on all 15N-1H RDC data (representing 33 proteins) available in the BMRB database and the literature. The program correctly identified the structural folds for approximately 80% of the target proteins, significantly better than previously reported results, and achieved an average alignment accuracy of 97.9% residues within 4-residue shift. Through a careful algorithmic design, RDC-PROSPECT is at least one order of magnitude faster than previously reported algorithms for principal alignment frame search, making our algorithm fast enough for large-scale applications.
剩余偶极耦合(RDC)是用于研究蛋白质结构的最令人兴奋的新兴核磁共振技术之一。然而,仅使用RDC数据来解析蛋白质结构是一个极具挑战性的问题,因为这通常要求起始结构模型接近蛋白质的实际结构,以使结构计算过程有效。我们在本文中报告了一个计算机程序RDC-PROSPECT,用于在蛋白质数据银行(PDB)中识别与目标蛋白质结构同源或类似的结构,该结构与在单一有序介质中记录的蛋白质的15N-1H RDC数据最匹配。然后,所识别的结构同源物/类似物可作为基于RDC的结构计算的起始模型。由于RDC-PROSPECT仅使用RDC数据和预测的二级结构信息,其性能实际上与目标蛋白质与其结构同源物/类似物之间的序列相似性无关,从而使其适用于当前蛋白质穿线技术范围之外的蛋白质目标。我们已在生物磁共振银行(BMRB)数据库和文献中可用的所有15N-1H RDC数据(代表33种蛋白质)上测试了RDC-PROSPECT。该程序正确识别了约80%的目标蛋白质的结构折叠,明显优于先前报道的结果,并且在4个残基移位范围内实现了97.9%残基的平均比对准确率。通过精心的算法设计,RDC-PROSPECT比先前报道的用于主比对框架搜索的算法至少快一个数量级,使我们的算法足够快以适用于大规模应用。