Plewniak F, Thompson J D, Poch O
Institut de Génétique et de Biologie Moléculaire et Cellulaire, Laboratoire de Biologie Structurale, (CNRS/INSERM/ULP), BP 163, 67404 Illkirch Cedex, France.
Bioinformatics. 2000 Sep;16(9):750-9. doi: 10.1093/bioinformatics/16.9.750.
Blast programs are very efficient in finding relatively strong similarities but some very distantly related sequences are given a very high Expect value and are ranked very low in Blast results. We have developed Ballast, a program to predict local maximum segments (LMSs-i.e. sequence segments conserved relatively to their flanking regions) from a single Blast database search and to highlight these divergent homologues. The TBlastN database searches can also be processed with the help of information from a joint BlastP search.
We have applied the Ballast algorithm to BlastP searches performed with sequences belonging to well described dispersed families (aminoacyl-tRNA synthetases; helicases) against the SwissProt 38 database. We show that Ballast is able to build an appropriate conservation profile and that LMSs are predicted that are consistent with the signatures and motifs described in the literature. Furthermore, by comparing the Blast, PsiBlast and Ballast results obtained on a well defined database of structurally related sequences, we show that the LMSs provide a scoring scheme that can concentrate on top ranking distant homologues better than Blast. Using the graphical user interface available on the Web, specific LMSs may be selected to detect divergent homologues sharing the corresponding properties with the query sequence without requiring any additional database search.
Blast程序在寻找相对较强的相似性方面非常高效,但一些亲缘关系非常远的序列的期望值非常高,在Blast结果中的排名非常低。我们开发了Ballast程序,该程序可从单次Blast数据库搜索中预测局部最大片段(LMS,即相对于其侧翼区域保守的序列片段),并突出显示这些差异同源物。TBlastN数据库搜索也可以借助联合BlastP搜索的信息进行处理。
我们将Ballast算法应用于使用属于描述详细的分散家族(氨酰-tRNA合成酶;解旋酶)的序列对SwissProt 38数据库进行的BlastP搜索。我们表明,Ballast能够构建适当的保守图谱,并且预测的LMS与文献中描述的特征和基序一致。此外,通过比较在结构相关序列的明确定义数据库上获得的Blast、PsiBlast和Ballast结果,我们表明LMS提供了一种评分方案,该方案比Blast更能专注于排名靠前的远缘同源物。使用网络上可用的图形用户界面,可以选择特定的LMS来检测与查询序列具有相应特性的差异同源物,而无需任何额外的数据库搜索。