Zeng Jianyang, Tripathy Chittaranjan, Zhou Pei, Donald Bruce R
Department of Computer Science, Duke University, Durham, NC 27708, USA.
Comput Syst Bioinformatics Conf. 2008;7:169-81.
High-throughput structure determination based on solution Nuclear Magnetic Resonance (NMR) spectroscopy plays an important role in structural genomics. One of the main bottlenecks in NMR structure determination is the interpretation of NMR data to obtain a sufficient number of accurate distance restraints by assigning nuclear Overhauser effect (NOE) spectral peaks to pairs of protons. The difficulty in automated NOE assignment mainly lies in the ambiguities arising both from the resonance degeneracy of chemical shifts and from the uncertainty due to experimental errors in NOE peak positions. In this paper we present a novel NOE assignment algorithm, called HAusdorff-based NOE Assignment (HANA), that starts with a high-resolution protein backbone computed using only two residual dipolar couplings (RDCs) per residue, employs a Hausdorff-based pattern matching technique to deduce similarity between experimental and back-computed NOE spectra for each rotamer from a statistically diverse library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Our algorithm runs in time O(tn3 + tn log t), where t is the maximum number of rotamers per residue and n is the size of the protein. Application of our algorithm on biological NMR data for three proteins, namely, human ubiquitin, the zinc finger domain of the human DNA Y-polymerase Eta (pol eta) and the human Set2-Rpb1 interacting domain (hSRI) demonstrates that our algorithm overcomes spectral noise to achieve more than 90% assignment accuracy. Additionally, the final structures calculated using our automated NOE assignments have backbone RMSD < 1.7 A and all-heavy-atom RMSD < 2.5 A from reference structures that were determined either by X-ray crystallography or traditional NMR approaches. These results show that our NOE assignment algorithm can be successfully applied to protein NMR spectra to obtain high-quality structures.
基于溶液核磁共振(NMR)光谱的高通量结构测定在结构基因组学中发挥着重要作用。NMR结构测定的主要瓶颈之一是对NMR数据进行解释,通过将核Overhauser效应(NOE)光谱峰分配给质子对来获得足够数量的精确距离约束。自动NOE分配的困难主要在于化学位移的共振简并以及NOE峰位置实验误差导致的不确定性所产生的模糊性。在本文中,我们提出了一种新颖的NOE分配算法,称为基于豪斯多夫的NOE分配(HANA),该算法从仅使用每个残基两个剩余偶极耦合(RDC)计算的高分辨率蛋白质主链开始,采用基于豪斯多夫的模式匹配技术来推断来自统计上不同库的每个旋转异构体的实验和反向计算的NOE光谱之间的相似性,并驱动选择最佳的位置特异性旋转异构体以过滤模糊的NOE分配。我们的算法运行时间为O(tn3 + tn log t),其中t是每个残基的最大旋转异构体数量,n是蛋白质的大小。我们的算法在三种蛋白质的生物NMR数据上的应用,即人泛素、人DNA Y - 聚合酶Eta(pol eta)的锌指结构域和人Set2 - Rpb1相互作用结构域(hSRI),表明我们的算法克服了光谱噪声,实现了超过90%的分配准确率。此外,使用我们的自动NOE分配计算的最终结构与通过X射线晶体学或传统NMR方法确定的参考结构相比,主链RMSD < 1.7 Å,全重原子RMSD < 2.5 Å。这些结果表明,我们的NOE分配算法可以成功应用于蛋白质NMR光谱以获得高质量结构。