Kamisetty Hetunandan, Bailey-Kellogg Chris, Pandurangan Gopal
Department of Computer Science, Purdue University West Lafayette, IN 47907, USA.
Bioinformatics. 2006 Jan 15;22(2):172-80. doi: 10.1093/bioinformatics/bti786. Epub 2005 Nov 15.
Backbone resonance assignment is a critical bottleneck in studies of protein structure, dynamics and interactions by nuclear magnetic resonance (NMR) spectroscopy. A minimalist approach to assignment, which we call 'contact-based', seeks to dramatically reduce experimental time and expense by replacing the standard suite of through-bond experiments with the through-space (nuclear Overhauser enhancement spectroscopy, NOESY) experiment. In the contact-based approach, spectral data are represented in a graph with vertices for putative residues (of unknown relation to the primary sequence) and edges for hypothesized NOESY interactions, such that observed spectral peaks could be explained if the residues were 'close enough'. Due to experimental ambiguity, several incorrect edges can be hypothesized for each spectral peak. An assignment is derived by identifying consistent patterns of edges (e.g. for alpha-helices and beta-sheets) within a graph and by mapping the vertices to the primary sequence. The key algorithmic challenge is to be able to uncover these patterns even when they are obscured by significant noise.
This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. Our randomized algorithm aggregates simplices into polytopes and fixes inconsistencies with simple local modifications, called rotations, that maintain most of the structure already uncovered. In characterizing the effects of experimental noise, we employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental beta-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions.
Our algorithm has been implemented in the platform-independent Python code. The software can be freely obtained for academic use by request from the authors.
在通过核磁共振(NMR)光谱研究蛋白质结构、动力学和相互作用的过程中,主链共振归属是一个关键瓶颈。一种极简主义的归属方法,我们称之为“基于接触的”方法,旨在通过用空间效应实验(核欧沃豪斯效应光谱,NOESY)取代标准的一系列通过键的实验,大幅减少实验时间和费用。在基于接触的方法中,光谱数据以一种图形表示,其中顶点代表假定的残基(与一级序列关系未知),边代表假设的NOESY相互作用,这样如果残基“足够接近”,就能解释观察到的光谱峰。由于实验的模糊性,每个光谱峰可能会假设出几条错误的边。通过识别图中边的一致模式(例如对于α螺旋和β折叠)并将顶点映射到一级序列来得出归属。关键的算法挑战在于,即使这些模式被大量噪声掩盖,也要能够发现它们。
本文开发、分析并应用了一种新颖的算法,用于识别在被破坏的NOESY图中表示边的一致模式的多面体。我们的随机算法将单纯形聚合成多面体,并通过称为旋转的简单局部修改来修正不一致之处,这些修改会保留大部分已发现的结构。在表征实验噪声的影响时,我们采用了一种特定于NMR的随机图模型,以证明我们的算法在预期的多项式时间内具有最优性能,即使输入图被严重破坏。我们在噪声高达500%的图的模拟研究中证实了这一分析。最后,我们展示了该算法在几个实验性β折叠数据集上的实际应用。我们的方法能够消除绝大多数噪声边,并发现大量一致的相互作用集。
我们的算法已用与平台无关的Python代码实现。该软件可应作者要求免费获取以供学术使用。