Unité de Bioinformatique Structurale, UMR 3528, CNRS, and Departement de Bioinformatique, Biostatistique et Biologie Intégrative, USR 3756, CNRS , Institut Pasteur , 75015 Paris , France.
IRISA , 35042 Rennes , France.
J Chem Inf Model. 2019 Oct 28;59(10):4486-4503. doi: 10.1021/acs.jcim.9b00215. Epub 2019 Sep 6.
The optimization approaches classically used during the determination of protein structure encounter various difficulties, especially when the size of the conformational space is large. Indeed, in such a case, algorithmic convergence criteria are more difficult to set up. Moreover, the size of the search space makes it difficult to achieve a complete exploration. The interval branch-and-prune (iBP) approach, based on the reformulation of the distance geometry problem (DGP) provides a theoretical frame for the generation of protein conformations, by systematically sampling the conformational space. When an appropriate subset of interatomic distances is known exactly, this worst-case exponential-time algorithm is provably complete and fixed-parameter tractable. These guarantees, however, immediately disappear as distance measurement errors are introduced. Here we propose an improvement of this approach: threading-augmented interval branch-and-prune (TAiBP), where the combinatorial explosion of the original iBP approach arising from its exponential complexity is alleviated by partitioning the input instances into consecutive peptide fragments and by using self-organizing maps (SOMs) to obtain clusters of similar solutions. A validation of the TAiBP approach is presented here on a set of proteins of various sizes and structures. The calculation inputs are a uniform covalent geometry extracted from force field covalent terms, the backbone dihedral angles with error intervals, and a few long-range distances. For most of the proteins smaller than 50 residues and interval widths of 20°, the TAiBP approach yielded solutions with RMSD values smaller than 3 Å with respect to the initial protein conformation. The efficiency of the TAiBP approach for proteins larger than 50 residues will require the use of nonuniform covalent geometry and may have benefits from the recent development of residue-specific force-fields.
经典的蛋白质结构测定中的优化方法在面对大构象空间时会遇到各种困难。实际上,在这种情况下,算法收敛标准更难设定。此外,搜索空间的大小使得完全探索变得困难。基于距离几何问题(DGP)的重新表述的区间分支定界(iBP)方法为蛋白质构象的生成提供了一个理论框架,通过系统地采样构象空间。当确切地知道一组适当的原子间距离时,这种最坏情况下的指数时间算法是可证明完整且固定参数可处理的。然而,当引入距离测量误差时,这些保证立即消失。在这里,我们提出了对这种方法的改进:线程增强的区间分支定界(TAiBP),其中原始 iBP 方法的组合爆炸由于其指数复杂性而得到缓解,通过将输入实例划分为连续的肽片段并使用自组织映射(SOM)来获得相似解决方案的聚类。在这里,我们对一组具有不同大小和结构的蛋白质进行了 TAiBP 方法的验证。计算输入是从力场共价项中提取的均匀共价几何形状、带有误差间隔的主链二面角以及几个远程距离。对于大多数小于 50 个残基且间隔宽度为 20°的蛋白质,TAiBP 方法产生的解决方案的均方根偏差(RMSD)值小于初始蛋白质构象的 3Å。对于大于 50 个残基的蛋白质,TAiBP 方法的效率将需要使用非均匀共价几何形状,并且可能受益于最近开发的针对残基的力场。