MacCallum Justin L, Perez Alberto, Dill Ken A
Department of Chemistry, University of Calgary, Calgary, AB, Canada T2N 1N4;
Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794;
Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):6985-90. doi: 10.1073/pnas.1506788112. Epub 2015 May 18.
More than 100,000 protein structures are now known at atomic detail. However, far more are not yet known, particularly among large or complex proteins. Often, experimental information is only semireliable because it is uncertain, limited, or confusing in important ways. Some experiments give sparse information, some give ambiguous or nonspecific information, and others give uncertain information-where some is right, some is wrong, but we don't know which. We describe a method called Modeling Employing Limited Data (MELD) that can harness such problematic information in a physics-based, Bayesian framework for improved structure determination. We apply MELD to eight proteins of known structure for which such problematic structural data are available, including a sparse NMR dataset, two ambiguous EPR datasets, and four uncertain datasets taken from sequence evolution data. MELD gives excellent structures, indicating its promise for experimental biomolecule structure determination where only semireliable data are available.
目前已有超过10万种蛋白质结构的原子细节已知。然而,仍有更多结构未知,尤其是在大型或复杂蛋白质中。通常,实验信息只是半可靠的,因为它在重要方面是不确定、有限或令人困惑的。一些实验提供的信息稀疏,一些提供的信息模糊或不明确,还有一些提供的信息不确定——有些是正确的,有些是错误的,但我们不知道哪些是正确的。我们描述了一种名为利用有限数据建模(MELD)的方法,该方法可以在基于物理学的贝叶斯框架中利用此类有问题的信息,以改进结构测定。我们将MELD应用于八个已知结构的蛋白质,这些蛋白质有此类有问题的结构数据,包括一个稀疏的核磁共振数据集、两个模糊的电子顺磁共振数据集以及四个从序列进化数据中获取的不确定数据集。MELD给出了出色的结构,表明它在仅有半可靠数据的实验性生物分子结构测定中具有应用前景。