Data Management and Software Centre, European Spallation Source ERIC, Copenhagen, Denmark.
Biochemistry and Structural Biology, University of Lund, Lund, Sweden.
PLoS Comput Biol. 2018 Dec 17;14(12):e1006641. doi: 10.1371/journal.pcbi.1006641. eCollection 2018 Dec.
Many proteins consist of folded domains connected by regions with higher flexibility. The details of the resulting conformational ensemble play a central role in controlling interactions between domains and with binding partners. Small-Angle Scattering (SAS) is well-suited to study the conformational states adopted by proteins in solution. However, analysis is complicated by the limited information content in SAS data and care must be taken to avoid constructing overly complex ensemble models and fitting to noise in the experimental data. To address these challenges, we developed a method based on Bayesian statistics that infers conformational ensembles from a structural library generated by all-atom Monte Carlo simulations. The first stage of the method involves a fast model selection based on variational Bayesian inference that maximizes the model evidence of the selected ensemble. This is followed by a complete Bayesian inference of population weights in the selected ensemble. Experiments with simulated ensembles demonstrate that model evidence is capable of identifying the correct ensemble and that correct number of ensemble members can be recovered up to high level of noise. Using experimental data, we demonstrate how the method can be extended to include data from Nuclear Magnetic Resonance (NMR) and structural energies of conformers extracted from the all-atom energy functions. We show that the data from SAXS, NMR chemical shifts and energies calculated from conformers can work synergistically to improve the definition of the conformational ensemble.
许多蛋白质由折叠结构域通过柔性较高的区域连接而成。这些构象的细节在控制结构域之间以及与结合伴侣之间的相互作用方面起着核心作用。小角度散射(SAS)非常适合研究蛋白质在溶液中采用的构象状态。然而,由于 SAS 数据的信息量有限,分析变得复杂,必须注意避免构建过于复杂的构象模型并拟合实验数据中的噪声。为了解决这些挑战,我们开发了一种基于贝叶斯统计的方法,该方法从通过全原子蒙特卡罗模拟生成的结构库中推断构象集合。该方法的第一阶段涉及基于变分贝叶斯推断的快速模型选择,该推断最大限度地提高了所选集合的模型证据。随后是在所选集合中对种群权重进行完整的贝叶斯推断。用模拟集合进行的实验表明,模型证据能够识别正确的集合,并且可以在高噪声水平下恢复正确数量的集合成员。使用实验数据,我们演示了如何扩展该方法以包括来自核磁共振(NMR)的数据和从全原子能量函数中提取的构象的结构能量。我们表明,来自 SAXS、NMR 化学位移和从构象计算的能量的数据可以协同作用,以改善构象集合的定义。