Yang Jinn-Moon, Tung Chi-Hua
Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, 30050, Taiwan.
Nucleic Acids Res. 2006 Aug 2;34(13):3646-59. doi: 10.1093/nar/gkl395. Print 2006.
As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].
随着越来越多的蛋白质结构被解析出来,并且结构基因组学研究在全基因组范围内提供了结构模型,对于快速准确地发现同源蛋白质以及对新测定结构进行进化分类的方法需求日益增长。我们开发了3D-BLAST,部分目的就是为了解决这些问题。3D-BLAST与BLAST一样快,并且能计算比对的统计显著性(E值)以表明预测的可靠性。使用这种方法,我们首先确定了23种结构字母状态,它们代表主链片段的模式轮廓,然后用它们将蛋白质结构数据库表示为结构字母序列数据库(SADB)。我们的方法通过使用新的结构字母替换矩阵(SASM),从SADB数据库中找到具有高分结构片段对的最长公共子结构,从而增强了BLAST作为一种搜索方法的功能。使用配备英特尔奔腾4(2.8 GHz)处理器的个人电脑,我们的方法在1.3秒内搜索了超过10000个蛋白质结构,并且与详细结构比对方法的搜索结果达成了良好的一致性。[3D-BLAST可在http://3d-blast.life.nctu.edu.tw获取]