Sargsyan Karen, Grauffel Cédric, Lim Carmay
Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan.
Department of Chemistry, National Tsinghua University , Hsinchu 300, Taiwan.
J Chem Theory Comput. 2017 Apr 11;13(4):1518-1524. doi: 10.1021/acs.jctc.7b00028. Epub 2017 Mar 16.
The root-mean-square deviation (RMSD) is a similarity measure widely used in analysis of macromolecular structures and dynamics. As increasingly larger macromolecular systems are being studied, dimensionality effects such as the "curse of dimensionality" (a diminishing ability to discriminate pairwise differences between conformations with increasing system size) may exist and significantly impact RMSD-based analyses. For such large bimolecular systems, whether the RMSD or other alternative similarity measures might suffer from this "curse" and lose the ability to discriminate different macromolecular structures had not been explicitly addressed. Here, we show such dimensionality effects for both weighted and nonweighted RMSD schemes. We also provide a mechanism for the emergence of the "curse of dimensionality" for RMSD from the law of large numbers by showing that the conformational distributions from which RMSDs are calculated become increasingly similar as the system size increases. Our findings suggest the use of weighted RMSD schemes for small proteins (less than 200 residues) and nonweighted RMSD for larger proteins when analyzing molecular dynamics trajectories.
均方根偏差(RMSD)是一种在大分子结构与动力学分析中广泛使用的相似性度量。随着越来越大的大分子系统被研究,可能存在诸如“维度诅咒”(随着系统规模增大,辨别构象间成对差异的能力逐渐减弱)等维度效应,这会对基于RMSD的分析产生显著影响。对于此类大的双分子系统,RMSD或其他替代相似性度量是否会受此“诅咒”影响而失去辨别不同大分子结构的能力,此前尚未得到明确探讨。在此,我们展示了加权和非加权RMSD方案的此类维度效应。我们还通过表明随着系统规模增大,用于计算RMSD的构象分布变得越来越相似,从大数定律的角度为RMSD“维度诅咒”的出现提供了一种机制。我们的研究结果表明,在分析分子动力学轨迹时,对于小蛋白质(少于200个残基)使用加权RMSD方案,对于较大蛋白质使用非加权RMSD。