Zhang Z, Schäffer A A, Miller W, Madden T L, Lipman D J, Koonin E V, Altschul S F
Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA.
Nucleic Acids Res. 1998 Sep 1;26(17):3986-90. doi: 10.1093/nar/26.17.3986.
Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.
蛋白质家族通常以保守的序列模式或基序为特征。研究人员经常希望评估蛋白质中特定模式的重要性,或者利用已知基序的知识来帮助识别差异很大但同源的家族成员。为了协助这些工作,本文描述的模式命中启动的BLAST(PHI-BLAST)程序将蛋白质序列和其中包含的感兴趣的模式作为输入。PHI-BLAST在蛋白质数据库中搜索输入模式的其他实例,并将找到的这些实例用作构建与查询序列的局部比对的种子。对PHI-BLAST比对分数的随机分布进行了分析和实证研究。在许多情况下,该程序能够检测到使用传统的单通道数据库搜索方法无法识别出明显相关性的同源蛋白质之间具有统计学意义的相似性。PHI-BLAST应用于CED4样细胞死亡调节因子、HS90型ATP酶结构域、古细菌tRNA核苷酸转移酶和DnaG型DNA引发酶的古细菌同源物的分析。