Wan Xiang, Lin Guohui
Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
J Bioinform Comput Biol. 2007 Apr;5(2a):313-33. doi: 10.1142/s021972000700262x.
The success in backbone resonance sequential assignment is fundamental to three dimensional protein structure determination via Nuclear Magnetic Resonance (NMR) spectroscopy. Such a sequential assignment can roughly be partitioned into three separate steps: grouping resonance peaks in multiple spectra into spin systems, chaining the resultant spin systems into strings, and assigning these strings to non-overlapping consecutive amino acid residues in the target protein. Separately dealing with these three steps has been adopted in many existing assignment programs, and it works well on protein NMR data with close-to-ideal quality, while only moderately or even poorly on most real protein datasets, where noises as well as data degeneracies occur frequently. We propose in this work to partition the sequential assignment not by physical steps, but only virtual steps, and use their outputs to cross validate each other. The novelty lies in the places, where the ambiguities at the grouping step will be resolved in finding the highly confident strings at the chaining step, and the ambiguities at the chaining step will be resolved by examining the mappings of strings at the assignment step. In this way, all ambiguities at the sequential assignment will be resolved globally and optimally. The resultant assignment program is called Graph-based Approach for Sequential Assignment (GASA), which has been compared to several recent similar developments including PACES, RANDOM, MARS, and RIBRA. The performance comparisons with these works demonstrated that GASA is more promising for practical use.
通过核磁共振(NMR)光谱法测定蛋白质三维结构时,主链共振序列归属的成功是至关重要的。这样的序列归属大致可分为三个独立步骤:将多个光谱中的共振峰分组为自旋系统,将所得自旋系统链接成链,并将这些链分配给目标蛋白质中不重叠的连续氨基酸残基。许多现有归属程序都分别处理这三个步骤,对于质量接近理想的蛋白质NMR数据,该方法效果良好,但对于大多数实际蛋白质数据集,效果仅为中等甚至很差,因为实际数据集中经常出现噪声和数据简并。在这项工作中,我们建议序列归属不是按物理步骤,而是仅按虚拟步骤进行划分,并使用它们的输出相互交叉验证。新颖之处在于,在分组步骤中的模糊性将在链接步骤中找到高度可靠的链时得到解决,而链接步骤中的模糊性将通过在归属步骤中检查链的映射来解决。通过这种方式,序列归属中的所有模糊性将在全局范围内得到最佳解决。由此产生的归属程序称为基于图的序列归属方法(GASA),已与最近的几个类似程序进行了比较,包括PACES、RANDOM、MARS和RIBRA。与这些方法的性能比较表明,GASA在实际应用中更具前景。