Centre de Biologie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France.
European Molecular Biology Laboratory, Hamburg Unit, Notkestrasse 85, 22607 Hamburg, Germany.
J Chem Theory Comput. 2021 Apr 13;17(4):2014-2021. doi: 10.1021/acs.jctc.1c00014. Epub 2021 Mar 16.
The Ensemble Optimization Method (EOM) is a popular approach to describe small-angle X-ray scattering (SAXS) data from highly disordered proteins. The EOM algorithm selects subensembles of coexisting states from large pools of randomized conformations to fit the SAXS data. Based on the unphysical bimodal radius of gyration () distribution of conformations resulting from the EOM analysis, a recent article (Fagerberg et al. 2019, 15 (12), 6968-6983) concluded that this approach inadequately described the SAXS data measured for human Histatin 5 (Hst5), a peptide with antifungal properties. Using extensive experimental and synthetic data, we explored the origin of this observation. We found that the one-bead-per-residue coarse-grained representation with averaged scattering form factors (provided in the EOM as an add-on to represent disordered missing loops or domains) may not be appropriate for EOM analyses of scattering data from short (below 50 residues) proteins/peptides. The method of choice for these proteins is to employ atomistic models (e.g., from molecular dynamics simulations) to sample the protein conformational landscape. As a convenient alternative, we have also improved the coarse-grained approach by introducing amino acid specific form factors in the calculations. We also found that, for small proteins, the search for relatively large subensembles of 20-50 conformers (as implemented in the original EOM version) more adequately describes the conformational space sampled in solution than the procedures optimizing the ensemble size. Our observations have been added as recommendations into the information for EOM users to promote the proper utilization of the program for ensemble-based modeling of SAXS data for all types of disordered systems.
集合优化方法(EOM)是一种用于描述高度无序蛋白质小角 X 射线散射(SAXS)数据的流行方法。EOM 算法从大量随机构象中选择共存状态的子集合来拟合 SAXS 数据。基于 EOM 分析产生的构象不合理的双峰回转半径()分布,最近的一篇文章(Fagerberg 等人,2019 年,15(12),6968-6983)得出结论,这种方法不能充分描述测量得到的人组蛋白 5(Hst5)的 SAXS 数据,Hst5 是一种具有抗真菌特性的肽。使用广泛的实验和合成数据,我们探讨了这一观察结果的起源。我们发现,一珠一残基的粗粒化表示法(在 EOM 中作为一个附加项提供,用于表示无序缺失的环或结构域)可能不适合 EOM 分析短(低于 50 个残基)蛋白质/肽的散射数据。对于这些蛋白质,选择的方法是使用原子模型(例如,来自分子动力学模拟)来采样蛋白质构象景观。作为一种方便的替代方法,我们还通过在计算中引入氨基酸特定的散射因子来改进粗粒化方法。我们还发现,对于小蛋白质,相对于 20-50 个构象的较大子集合的搜索(如在原始 EOM 版本中实现)更能充分描述溶液中采样的构象空间,而不是优化集合大小的过程。我们的观察结果已作为建议添加到 EOM 用户信息中,以促进该程序在所有类型的无序系统的基于集合的 SAXS 数据建模中的正确使用。