Department of Chemistry, ‡Department of Bioengineering, §Department of Chemical and Biomolecular Engineering, ∥Chemical Sciences Division, Lawrence Berkeley National Laboratory, University of California , Berkeley, California 94720, United States.
J Am Chem Soc. 2016 Apr 6;138(13):4530-8. doi: 10.1021/jacs.6b00351. Epub 2016 Mar 25.
We develop a Bayesian approach to determine the most probable structural ensemble model from candidate structures for intrinsically disordered proteins (IDPs) that takes full advantage of NMR chemical shifts and J-coupling data, their known errors and variances, and the quality of the theoretical back-calculation from structure to experimental observables. Our approach differs from previous formulations in the optimization of experimental and back-calculation nuisance parameters that are treated as random variables with known distributions, as opposed to structural or ensemble weight optimization or use of a reference ensemble. The resulting experimental inferential structure determination (EISD) method is size extensive with O(N) scaling, with N = number of structures, that allows for the rapid ranking of large ensemble data comprising tens of thousands of conformations. We apply the EISD approach on singular folded proteins and a corresponding set of ∼25 000 misfolded states to illustrate the problems that can arise using Boltzmann weighted priors. We then apply the EISD method to rank IDP ensembles most consistent with the NMR data and show that the primary error for ranking or creating good IDP ensembles resides in the poor back-calculation from structure to simulated experimental observable. We show that a reduction by a factor of 3 in the uncertainty of the back-calculation error can improve the discrimination among qualitatively different IDP ensembles for the amyloid-beta peptide.
我们开发了一种贝叶斯方法,从候选结构中确定最可能的结构整体模型,用于无序蛋白质(IDP),该方法充分利用了 NMR 化学位移和 J 耦合数据、其已知的误差和方差,以及从结构到实验观测值的理论反算的质量。我们的方法与以前的公式不同,在于对实验和反算的干扰参数进行优化,这些参数被视为具有已知分布的随机变量,而不是结构或整体权重优化,也不使用参考整体。由此产生的实验推断结构确定(EISD)方法具有扩展性,规模为 O(N),其中 N = 结构数量,这允许对包含数万种构象的大型整体数据进行快速排序。我们将 EISD 方法应用于奇异折叠蛋白和一组相应的约 25000 种错误折叠状态,以说明使用玻尔兹曼加权先验可能出现的问题。然后,我们应用 EISD 方法对与 NMR 数据最一致的 IDP 整体进行排名,并表明排名或创建良好 IDP 整体的主要误差在于从结构到模拟实验观测值的反算误差较差。我们表明,反算误差不确定性降低 3 倍可以提高对淀粉样β肽的不同 IDP 整体的定性区分。