一种用于从NOESY数据中准确识别质子接触的无自动分配贝叶斯方法。

An automated assignment-free Bayesian approach for accurately identifying proton contacts from NOESY data.

作者信息

Hung Ling-Hong, Samudrala Ram

机构信息

Department of Microbiology, University of Washington, Rosen Building, 960 Republican, Seattle, WA 98109, USA.

出版信息

J Biomol NMR. 2006 Nov;36(3):189-98. doi: 10.1007/s10858-006-9082-1. Epub 2006 Oct 3.

DOI:10.1007/s10858-006-9082-1

PMID:17016668

Abstract

The identification of proton contacts from NOE spectra remains the major bottleneck in NMR protein structure calculations. We describe an automated assignment-free system for deriving proton contact probabilities from NOESY peak lists that can be viewed as a quantitative extension of manual assignment techniques. Rather than assigning contacts to NOESY crosspeaks, a rigorous Bayesian methodology is used to transform initial proton contact probabilities derived from a set of 2992 protein structures into posterior probabilities using the observed crosspeaks as evidence. Given a target protein, the Bayesian approach is used to derive probabilities for all possible proton contacts. We evaluated the accuracy of this approach at predicting proton contacts on 60 (15)N separated NOESY and (13)C separated NOESY datasets simulated from experimentally determined NMR structures and compared it to CYANA, an established method for proton constraint assignment. On average, at the highest confidence level, our method accurately identifies 3.16/3.17 long range contacts per residue and 12.11/12.18 interresidue proton contacts per residue. These accuracies represent a significant increase over the performance of CYANA on the same data set. On a difficult real dataset that is publicly available, the coverage is lower but our method retains its advantage in accuracy over CANDID/CYANA. The algorithm is publicly available via the Protinfo NMR webserver http://protinfo.compbio.washington.edu/protinfo_nmr .

摘要

从NOE谱中识别质子接触仍然是NMR蛋白质结构计算中的主要瓶颈。我们描述了一种无需自动分配的系统，用于从NOESY峰列表中推导质子接触概率，该系统可被视为手动分配技术的定量扩展。不是将接触分配给NOESY交叉峰，而是使用一种严格的贝叶斯方法，以观察到的交叉峰为证据，将从2992个蛋白质结构集合中得出的初始质子接触概率转换为后验概率。对于一个目标蛋白质，使用贝叶斯方法推导所有可能质子接触的概率。我们在从实验确定的NMR结构模拟的60个（15）N分离的NOESY和（13）C分离的NOESY数据集上评估了该方法预测质子接触的准确性，并将其与CYANA（一种既定的质子约束分配方法）进行了比较。平均而言，在最高置信水平下，我们的方法每个残基准确识别3.16/3.17个长程接触和12.11/12.18个残基间质子接触。这些准确性相较于CYANA在同一数据集上的表现有显著提高。在一个公开可用的困难真实数据集上，覆盖率较低，但我们的方法在准确性上仍优于CANDID/CYANA。该算法可通过Protinfo NMR网络服务器http://protinfo.compbio.washington.edu/protinfo_nmr公开获取。