Akutsu T, Onizuka K, Ishikawa M
Human Genome Center, University of Tokyo, Japan.
Comput Appl Biosci. 1997 Aug;13(4):357-64. doi: 10.1093/bioinformatics/13.4.357.
Since the protein structure database has been growing very rapidly in recent years, the development of efficient methods for searching for similar structures is very important.
This paper presents a novel method for searching for similar fragments of proteins. In this method, a hash vector (a vector of real numbers) is associated with each fixed-length fragment of three-dimensional protein structure. Each vector consists of low-frequency components of the Fourier-like spectrum for the distances between C alpha atoms and the centroid. Then, we can analyze the similarity between fragments by evaluating the difference between hash vectors. The novel aspect of the method is that the following property is proved theoretically: if the root mean square distance between two fragments is small, then the distance between the hash vectors is small. Several variants of this method were compared with a naive method and a previous method using PDB data. The results show that the fastest one among the variants is 18-80 times faster than the naive method, and 3-10 times faster than the previous method.
近年来蛋白质结构数据库增长迅速,因此开发高效的相似结构搜索方法非常重要。
本文提出了一种搜索蛋白质相似片段的新方法。在该方法中,一个哈希向量(实数值向量)与三维蛋白质结构的每个固定长度片段相关联。每个向量由Cα原子与质心之间距离的类傅里叶谱的低频分量组成。然后,我们可以通过评估哈希向量之间的差异来分析片段之间的相似性。该方法的新颖之处在于从理论上证明了以下性质:如果两个片段之间的均方根距离较小,那么哈希向量之间的距离也较小。使用PDB数据将该方法的几个变体与一种朴素方法和一种先前方法进行了比较。结果表明,这些变体中最快的比朴素方法快18至80倍,比先前方法快3至10倍。