Shibuya Tetsuo
Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan.
J Comput Biol. 2007 Nov;14(9):1201-7. doi: 10.1089/cmb.2007.0079.
Protein structure analysis is a very important research topic in the molecular biology of the post-genomic era. The root mean square deviation (RMSD) is the most frequently used measure for comparing two protein three-dimensional (3-D) structures. In this paper, we deal with two fundamental problems related to the RMSD. We first deal with a problem called the "range RMSD query" problem. Given an aligned pair of structures, the problem is to compute the RMSD between two aligned substructures of them without gaps. This problem has many applications in protein structure analysis. We propose a linear-time preprocessing algorithm that enables constant-time RMSD computation. Next, we consider a problem called the "substructure RMSD query" problem, which is a generalization of the above range RMSD query problem. It is a problem to compute the RMSD between any substructures of two unaligned structures without gaps. Based on the algorithm for the range RMSD problem, we propose an O(nm) preprocessing algorithm that enables constant-time RMSD computation, where n and m are the lengths of the given structures. Moreover, we propose O(nm log r/r)-time and O(nm/r)-space preprocessing algorithm that enables O(r) query, where r is an arbitrary integer such that 1 < or = r < or = min(n, m). We also show that our strategy also works for another measure called the unit-vector root mean square deviation (URMSD), which is a variant of the RMSD.
蛋白质结构分析是后基因组时代分子生物学中一个非常重要的研究课题。均方根偏差(RMSD)是比较两个蛋白质三维(3-D)结构时最常用的度量。在本文中,我们处理与RMSD相关的两个基本问题。我们首先处理一个称为“范围RMSD查询”的问题。给定一对对齐的结构,该问题是计算它们两个无间隙对齐子结构之间的RMSD。这个问题在蛋白质结构分析中有许多应用。我们提出了一种线性时间预处理算法,该算法能够进行常数时间的RMSD计算。接下来,我们考虑一个称为“子结构RMSD查询”的问题,它是上述范围RMSD查询问题的推广。它是计算两个未对齐结构的任意子结构之间无间隙的RMSD的问题。基于范围RMSD问题的算法,我们提出了一种O(nm)预处理算法,该算法能够进行常数时间的RMSD计算,其中n和m分别是给定结构的长度。此外,我们提出了O(nm log r/r)时间和O(nm/r)空间的预处理算法,该算法能够进行O(r)查询,其中r是一个任意整数,满足1 <= r <= min(n, m)。我们还表明,我们的策略也适用于另一种称为单位向量均方根偏差(URMSD)的度量,它是RMSD的一种变体。