LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France; Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, Toulouse, France.
Centre de Biologie Structurale, Université de Montpellier, INSERM, CNRS, Montpellier, France.
J Mol Biol. 2023 Jul 15;435(14):168053. doi: 10.1016/j.jmb.2023.168053. Epub 2023 Mar 18.
The structural investigation of intrinsically disordered proteins (IDPs) requires ensemble models describing the diversity of the conformational states of the molecule. Due to their probabilistic nature, there is a need for new paradigms that understand and treat IDPs from a purely statistical point of view, considering their conformational ensembles as well-defined probability distributions. In this work, we define a conformational ensemble as an ordered set of probability distributions and provide a suitable metric to detect differences between two given ensembles at the residue level, both locally and globally. The underlying geometry of the conformational space is properly integrated, one ensemble being characterized by a set of probability distributions supported on the three-dimensional Euclidean space (for global-scale comparisons) and on the two-dimensional flat torus (for local-scale comparisons). The inherent uncertainty of the data is also taken into account to provide finer estimations of the differences between ensembles. Additionally, an overall distance between ensembles is defined from the differences at the residue level. We illustrate the potential of the approach with several examples of applications for the comparison of conformational ensembles: (i) produced from molecular dynamics (MD) simulations using different force fields, and (ii) before and after refinement with experimental data. We also show the usefulness of the method to assess the convergence of MD simulations, and discuss other potential applications such as in machine-learning-based approaches. The numerical tool has been implemented in Python through easy-to-use Jupyter Notebooks available at https://gitlab.laas.fr/moma/WASCO.
本研究旨在构建一种描述分子构象多样性的集合模型,以实现对无规卷曲蛋白质(IDP)的结构研究。由于其概率性质,需要从纯粹的统计学角度来理解和处理 IDP,将其构象集合视为明确的概率分布。在这项工作中,我们将构象集合定义为有序的概率分布集,并提供了一种合适的度量标准,用于在残基水平上检测两个给定集合之间的差异,包括局部和全局差异。此外,还适当整合了构象空间的底层几何形状,其中一个集合由一组概率分布组成,这些分布分别支持在三维欧几里得空间(用于全局尺度比较)和二维平坦环面(用于局部尺度比较)上。同时,还考虑了数据的固有不确定性,以提供更精细的集合之间差异估计。此外,还定义了集合之间的整体距离,该距离来自残基水平的差异。我们通过几个应用示例来展示该方法的潜力,这些示例涉及构象集合的比较:(i)使用不同力场的分子动力学(MD)模拟产生的集合;(ii)使用实验数据进行精修前后的集合。我们还展示了该方法在评估 MD 模拟收敛性方面的有用性,并讨论了其他潜在的应用,如基于机器学习的方法。该数值工具已通过易于使用的 Jupyter Notebooks 在 Python 中实现,并可在 https://gitlab.laas.fr/moma/WASCO 上获取。