Chivot Lucas, Mathieux Noé, Cosson Anna, Bridier-Nahmias Antoine, Favennec Loïc, Gelly Jean-Christophe, Clain Jérôme, Coppée Romain
Université de Rouen Normandie, Laboratoire de Parasitologie-Mycologie, ESCAPE, F-76000 Rouen, France.
Université Paris Cité et Sorbonne Paris Nord, Inserm, IAME, F-75018 Paris, France.
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf166.
Evolutionary rates in protein-coding genes vary widely, reflecting functional and/or structural constraints. Essential or highly expressed proteins tend to evolve more slowly, and within a protein, different amino acid sites experience distinct selective pressures. Accurately modeling this variation is critical for identifying functional and/or structurally important amino acid sites. Standard methods assume independent substitution rates across sites, and the most conserved ones are widely distributed in protein tertiary structure. This is biologically unrealistic, as functional sites tend to cluster in 3D space.
Here, we developed CONSTRUCT, an improved strategy for detecting functional and structurally important regions in protein tertiary structure. Given a set of orthologous sequences, CONSTRUCT first estimates site-specific substitution rates using the Rate4site model. These rates are then weighted by the rates of neighboring amino acid sites within an optimally defined window size, determined by the strongest spatial correlation. To refine clustering detection, CONSTRUCT can analyze either Cα atoms or the center of mass of amino acid sites, accounting for side chain orientation. Extensive simulations and validation on 14 functionally characterized proteins of diverse sizes, interspecies conservation levels, and taxonomic origins demonstrated the robustness of CONSTRUCT. The results highlight CONSTRUCT as a powerful tool for guiding site-directed mutagenesis experiments aimed at elucidating protein function.
The CONSTRUCT program and documentation are freely available at https://github.com/Rcoppee/CONSTRUCT.
蛋白质编码基因的进化速率差异很大,反映了功能和/或结构限制。必需或高表达的蛋白质往往进化得更慢,并且在一种蛋白质中,不同的氨基酸位点经历不同的选择压力。准确模拟这种变异对于识别功能和/或结构重要的氨基酸位点至关重要。标准方法假设位点间的替换率是独立的,并且最保守的位点广泛分布在蛋白质三级结构中。这在生物学上是不现实的,因为功能位点往往在三维空间中聚集。
在这里,我们开发了CONSTRUCT,一种用于检测蛋白质三级结构中功能和结构重要区域的改进策略。给定一组直系同源序列,CONSTRUCT首先使用Rate4site模型估计位点特异性替换率。然后,这些速率通过在由最强空间相关性确定的最佳定义窗口大小内的相邻氨基酸位点的速率进行加权。为了优化聚类检测,CONSTRUCT可以分析Cα原子或氨基酸位点的质心,同时考虑侧链方向。对14种具有不同大小、种间保守水平和分类学起源的功能表征蛋白质进行的广泛模拟和验证证明了CONSTRUCT的稳健性。结果突出了CONSTRUCT作为指导旨在阐明蛋白质功能的定点诱变实验的强大工具。
CONSTRUCT程序和文档可在https://github.com/Rcoppee/CONSTRUCT上免费获得。