Inria/Univ. Grenoble Alpes/LJK-CNRS, Grenoble, France.
Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland.
Bioinformatics. 2018 Aug 15;34(16):2757-2765. doi: 10.1093/bioinformatics/bty160.
The root mean square deviation (RMSD) is one of the most used similarity criteria in structural biology and bioinformatics. Standard computation of the RMSD has a linear complexity with respect to the number of atoms in a molecule, making RMSD calculations time-consuming for the large-scale modeling applications, such as assessment of molecular docking predictions or clustering of spatially proximate molecular conformations. Previously, we introduced the RigidRMSD algorithm to compute the RMSD corresponding to the rigid-body motion of a molecule. In this study, we go beyond the limits of the rigid-body approximation by taking into account conformational flexibility of the molecule. We model the flexibility with a reduced set of collective motions computed with e.g. normal modes or principal component analysis.
The initialization of our algorithm is linear in the number of atoms and all the subsequent evaluations of RMSD values between flexible molecular conformations depend only on the number of collective motions that are selected to model the flexibility. Therefore, our algorithm is much faster compared to the standard RMSD computation for large-scale modeling applications. We demonstrate the efficiency of our method on several clustering examples, including clustering of flexible docking results and molecular dynamics (MD) trajectories. We also demonstrate how to use the presented formalism to generate pseudo-random constant-RMSD structural molecular ensembles and how to use these in cross-docking.
We provide the algorithm written in C++ as the open-source RapidRMSD library governed by the BSD-compatible license, which is available at http://team.inria.fr/nano-d/software/RapidRMSD/. The constant-RMSD structural ensemble application and clustering of MD trajectories is available at http://team.inria.fr/nano-d/software/nolb-normal-modes/.
Supplementary data are available at Bioinformatics online.
均方根偏差(RMSD)是结构生物学和生物信息学中最常用的相似性标准之一。标准 RMSD 的计算与分子中的原子数量呈线性关系,这使得 RMSD 计算在大规模建模应用中非常耗时,例如评估分子对接预测或对空间邻近的分子构象进行聚类。此前,我们引入了 RigidRMSD 算法来计算分子刚体运动对应的 RMSD。在本研究中,我们通过考虑分子的构象灵活性来超越刚体近似的限制。我们使用例如正常模式或主成分分析计算的简化集体运动来对灵活性进行建模。
我们的算法的初始化与原子数量呈线性关系,并且所有后续对柔性分子构象之间 RMSD 值的评估仅取决于用于模拟灵活性的所选集体运动的数量。因此,与大规模建模应用的标准 RMSD 计算相比,我们的算法速度要快得多。我们在几个聚类示例上展示了我们方法的效率,包括柔性对接结果和分子动力学(MD)轨迹的聚类。我们还展示了如何使用所提出的形式主义来生成具有伪随机常数 RMSD 的结构分子集合,以及如何在交叉对接中使用这些集合。
我们以 C++编写的算法作为 RapidRMSD 库提供,该库受 BSD 兼容许可证的约束,并可在 http://team.inria.fr/nano-d/software/RapidRMSD/ 获得。具有常数 RMSD 的结构分子集合应用程序和 MD 轨迹聚类可在 http://team.inria.fr/nano-d/software/nolb-normal-modes/ 获得。
补充数据可在 Bioinformatics 在线获得。