Terashi Genki, Shibuya Tetsuo, Takeda-Shitaka Mayuko
School of Pharmacy, Kitasato University, Tokyo, Japan.
J Comput Biol. 2012 May;19(5):493-503. doi: 10.1089/cmb.2011.0230. Epub 2012 Apr 17.
Searching for protein structure-function relationships using three-dimensional (3D) structural coordinates represents a fundamental approach for determining the function of proteins with unknown functions. Since protein structure databases are rapidly growing in size, the development of a fast search method to find similar protein substructures by comparison of protein 3D structures is essential. In this article, we present a novel protein 3D structure search method to find all substructures with root mean square deviations (RMSDs) to the query structure that are lower than a given threshold value. Our new algorithm runs in O(m + N/m(0.5)) time, after O(N log N) preprocessing, where N is the database size and m is the query length. The new method is 1.8-41.6 times faster than the practically best known O(N) algorithm, according to computational experiments using a huge database (i.e., >20,000,000 C-alpha coordinates).
利用三维(3D)结构坐标搜索蛋白质结构-功能关系是确定未知功能蛋白质功能的基本方法。由于蛋白质结构数据库的规模在迅速增长,因此开发一种通过比较蛋白质3D结构来快速搜索相似蛋白质子结构的方法至关重要。在本文中,我们提出了一种新颖的蛋白质3D结构搜索方法,用于找到所有与查询结构的均方根偏差(RMSD)低于给定阈值的子结构。我们的新算法在O(N log N)预处理之后,以O(m + N/m(0.5))的时间运行,其中N是数据库大小,m是查询长度。根据使用大型数据库(即>20,000,000个C-α坐标)的计算实验,新方法比实际最知名的O(N)算法快1.8至41.6倍。