LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France.
Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, 31400 Toulouse, France.
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae627.
Characterizing the structure of flexible proteins, particularly within the realm of intrinsic disorder, presents a formidable challenge due to their high conformational variability. Currently, their structural representation relies on (possibly large) conformational ensembles derived from a combination of experimental and computational methods. The detailed structural analysis of these ensembles is a difficult task, for which existing tools have limited effectiveness.
This study proposes an innovative extension of the concept of contact maps to the ensemble framework, incorporating the intrinsic probabilistic nature of disordered proteins. Within this framework, a conformational ensemble is characterized through a weighted family of contact maps. To achieve this, conformations are first described using a refined definition of contact that appropriately accounts for the geometry of the inter-residue interactions and the sequence context. Representative structural features of the ensemble naturally emerge from the subsequent clustering of the resulting contact-based descriptors. Importantly, transiently populated structural features are readily identified within large ensembles. The performance of the method is illustrated by several use cases and compared with other existing approaches, highlighting its superiority in capturing relevant structural features of highly flexible proteins.
An open-source implementation of the method is provided together with an easy-to-use Jupyter notebook, available at https://gitlab.laas.fr/moma/WARIO.
由于柔性蛋白质具有高度的构象可变性,因此对其结构进行特征描述,尤其是在固有无序领域,是一项艰巨的挑战。目前,它们的结构表示依赖于(可能很大的)构象集合,这些集合是通过实验和计算方法的组合得出的。对这些集合进行详细的结构分析是一项具有挑战性的任务,现有的工具在这方面效果有限。
本研究提出了一种将接触图的概念扩展到集合框架的创新方法,该方法纳入了无序蛋白质固有的概率性质。在这个框架中,构象集合通过加权的接触图族来描述。为此,首先使用一种经过改进的接触定义来描述构象,该定义适当考虑了残基间相互作用的几何形状和序列上下文。随后,从基于接触的描述符的聚类中自然出现了集合的代表性结构特征。重要的是,在大型集合中可以轻松识别暂态存在的结构特征。该方法通过几个用例进行了演示,并与其他现有方法进行了比较,突出了其在捕获高度柔性蛋白质的相关结构特征方面的优越性。
该方法的开源实现与一个易于使用的 Jupyter 笔记本一起提供,可在 https://gitlab.laas.fr/moma/WARIO 上获得。