Suppr超能文献

一种基于核磁共振数据确定蛋白质结构的代数几何方法。

An algebraic geometry approach to protein structure determination from NMR data.

作者信息

Wang Lincong, Mettu Ramgopal R, Donald Bruce Randall

机构信息

Dartmouth Computer Science Department, Hanover, NH 03755, USA.

出版信息

Proc IEEE Comput Syst Bioinform Conf. 2005:235-46. doi: 10.1109/csb.2005.11.

Abstract

Our paper describes the first provably-efficient algorithm for determining protein structures de novo, solely from experimental data. We show how the global nature of a certain kind of NMR data provides quantifiable complexity-theoretic benefits, allowing us to classify our algorithm as running in polynomial time. While our algorithm uses NMR data as input, it is the first polynomial-time algorithm to compute high-resolution structures de novo using any experimentally-recorded data, from either NMR spectroscopy or X-Ray crystallography. Improved algorithms for protein structure determination are needed, because currently, the process is expensive and time-consuming. For example, an area of intense research in NMR methodology is automated assignment of nuclear Overhauser effect (NOE) restraints, in which structure determination sits in a tight inner-loop (cycle) of assignment/refinement. These algorithms are very time-consuming, and typically require a large cluster. Thus, algorithms for protein structure determination that are known to run in polynomial time and provide guarantees on solution accuracy are likely to have great impact in the long-term. Methods stemming from a technique called "distance geometry embedding" do come with provable guarantees, but the NP-hardness of these problem formulations implies that in the worst case these techniques cannot run in polynomial time. We are able to avoid the NP-hardness by (a) some mild assumptions about the protein being studied, (b) the use of residual dipolar couplings (RDCs) instead of a dense network of NOEs, and (c) novel algorithms and proofs that exploit the biophysical geometry of (a) and (b), drawing on a variety of computer science, computational geometry, and computational algebra techniques. In our algorithm, RDC data, which gives global restraints on the orientation of internuclear bond vectors, is used in conjunction with very sparse NOE data to obtain a polynomial-time algorithm for protein structure determination. An implementation of our algorithm has been applied to 6 different real biological NMR data sets recorded for 3 proteins. Our algorithm is combinatorially precise, polynomial-time, and uses much less NMR data to produce results that are as good or better than previous approaches in terms of accuracy of the computed structure as well as running time. In practice approaches such as restrained molecular dynamics and simulated annealing, which lack both combinatorial precision and guarantees on running time and solution quality, are commonly used. Our results show that by using a different "slice" of the data, an algorithm that is polynomial time and that has guarantees about solution quality can be obtained. We believe that our techniques can be extended and generalized for other structure-determination problems such as computing side-chain conformations and the structure of nucleic acids from experimental data.

摘要

我们的论文描述了首个可证明高效的从头确定蛋白质结构的算法,该算法仅依据实验数据。我们展示了某种核磁共振(NMR)数据的全局性质如何带来可量化的复杂性理论优势,使我们能够将我们的算法归类为在多项式时间内运行。虽然我们的算法将NMR数据用作输入,但它是首个使用来自NMR光谱或X射线晶体学的任何实验记录数据从头计算高分辨率结构的多项式时间算法。由于目前蛋白质结构确定过程既昂贵又耗时,因此需要改进的算法。例如,NMR方法学中一个深入研究的领域是核Overhauser效应(NOE)约束的自动分配,其中结构确定处于分配/精修的紧密内循环(周期)中。这些算法非常耗时,并且通常需要一个大型集群。因此,已知在多项式时间内运行并能保证解的准确性的蛋白质结构确定算法从长远来看可能会产生重大影响。源自一种称为“距离几何嵌入”技术的方法确实有可证明的保证,但这些问题表述的NP难性意味着在最坏情况下这些技术不能在多项式时间内运行。我们能够通过以下方式避免NP难性:(a)对所研究蛋白质的一些适度假设;(b)使用剩余偶极耦合(RDC)而非密集的NOE网络;(c)利用(a)和(b)的生物物理几何结构的新颖算法和证明,借鉴各种计算机科学、计算几何和计算代数技术。在我们的算法中,给出核间键向量方向全局约束的RDC数据与非常稀疏的NOE数据结合使用,以获得用于蛋白质结构确定的多项式时间算法。我们算法的一个实现已应用于为3种蛋白质记录的6个不同的真实生物NMR数据集。我们的算法在组合上精确、运行于多项式时间,并且使用少得多的NMR数据来产生在计算结构的准确性以及运行时间方面与先前方法一样好或更好的结果。在实践中,诸如受限分子动力学和模拟退火等方法,既缺乏组合精度,又无法保证运行时间和解的质量,却被普遍使用。我们的结果表明,通过使用数据的不同“切片”,可以获得一个运行于多项式时间且能保证解质量的算法。我们相信,我们的技术可以扩展和推广到其他结构确定问题,例如根据实验数据计算侧链构象和核酸结构。

相似文献

1
An algebraic geometry approach to protein structure determination from NMR data.
Proc IEEE Comput Syst Bioinform Conf. 2005:235-46. doi: 10.1109/csb.2005.11.
6
REDCRAFT: A computational platform using residual dipolar coupling NMR data for determining structures of perdeuterated proteins in solution.
PLoS Comput Biol. 2021 Feb 1;17(2):e1008060. doi: 10.1371/journal.pcbi.1008060. eCollection 2021 Feb.
7
A geometric arrangement algorithm for structure determination of symmetric protein homo-oligomers from NOEs and RDCs.
J Comput Biol. 2011 Nov;18(11):1507-23. doi: 10.1089/cmb.2011.0173. Epub 2011 Oct 28.
8
Protein side-chain resonance assignment and NOE assignment using RDC-defined backbones without TOCSY data.
J Biomol NMR. 2011 Aug;50(4):371-95. doi: 10.1007/s10858-011-9522-4. Epub 2011 Jun 25.
10
A Bayesian approach for determining protein side-chain rotamer conformations using unassigned NOE data.
J Comput Biol. 2011 Nov;18(11):1661-79. doi: 10.1089/cmb.2011.0172. Epub 2011 Oct 4.

引用本文的文献

1
Automated NMR Assignment and Protein Structure Determination using Sparse Dipolar Coupling Constraints.
Prog Nucl Magn Reson Spectrosc. 2009 Aug 1;55(2):101-127. doi: 10.1016/j.pnmrs.2008.12.001.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验