Li W, Pio F, Pawłowski K, Godzik A
San Diego Supercomputer Center, La Jolla, CA 92093, USA.
Bioinformatics. 2000 Dec;16(12):1105-10. doi: 10.1093/bioinformatics/16.12.1105.
Two proteins can have a similar 3-dimensional structure and biological function, but have sequences sufficiently different that traditional protein sequence comparison algorithms do not identify their relationship. The desire to identify such relations has led to the development of more sensitive sequence alignment strategies. One such strategy is the Intermediate Sequence Search (ISS), which connects two proteins through one or more intermediate sequences. In its brute-force implementation, ISS is a strategy that repetitively uses the results of the previous query as new search seeds, making it time-consuming and difficult to analyze.
Saturated BLAST is a package that performs ISS in an efficient and automated manner. It was developed using Perl and Perl/Tk and implemented on the LINUX operating system. Starting with a protein sequence, Saturated BLAST runs a BLAST search and identifies representative sequences for the next generation of searches. The procedure is run until convergence or until some predefined criteria are met. Saturated BLAST has a friendly graphic user interface, a built-in BLAST result parser, several multiple alignment tools, clustering algorithms and various filters for the elimination of false positives, thereby providing an easy way to edit, visualize, analyze, monitor and control the search. Besides detecting remote homologies, Saturated BLAST can be used to maintain protein family databases and to search for new genes in genomic databases.
两种蛋白质可能具有相似的三维结构和生物学功能,但它们的序列差异足够大,以至于传统的蛋白质序列比较算法无法识别它们之间的关系。识别此类关系的需求促使人们开发出更灵敏的序列比对策略。中间序列搜索(ISS)就是这样一种策略,它通过一个或多个中间序列将两种蛋白质联系起来。在其暴力实现方式中,ISS是一种将前一次查询的结果重复用作新搜索种子的策略,这使得它既耗时又难以分析。
饱和BLAST是一个能高效且自动执行ISS的程序包。它是使用Perl和Perl/Tk开发的,并在LINUX操作系统上实现。从一个蛋白质序列开始,饱和BLAST运行一次BLAST搜索,并为下一代搜索识别代表性序列。该过程会一直运行,直到收敛或满足某些预定义标准。饱和BLAST具有友好的图形用户界面、内置的BLAST结果解析器、多个多重比对工具、聚类算法以及各种用于消除假阳性的过滤器,从而提供了一种编辑、可视化、分析、监控和控制搜索的简便方法。除了检测远程同源性外,饱和BLAST还可用于维护蛋白质家族数据库以及在基因组数据库中搜索新基因。