BLAST+:体系结构与应用。
BLAST+: architecture and applications.
机构信息
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
出版信息
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.
BACKGROUND
Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications.
RESULTS
We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site.
CONCLUSION
The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
背景
序列相似性搜索是一项非常重要的生物信息学任务。虽然基本局部比对搜索工具 (BLAST) 通过使用启发式方法优于精确方法,但当前 BLAST 软件的速度对于非常长的查询或数据库序列来说并不理想。当前命令行应用程序的用户界面也存在一些缺点。
结果
我们描述了重写的 BLAST 软件的功能和改进,并介绍了新的命令行应用程序。长查询序列被分解为多个块进行处理,在某些情况下会导致运行时间大大缩短。对于长数据库序列,可以仅检索序列的相关部分,从而减少对短查询的搜索的 CPU 时间和内存使用,这些短查询针对的是基因组或染色体数据库。该程序现在可以从 BLAST 数据库中检索数据库序列的掩蔽信息。新的模块化软件库现在可以从任意数据源访问主题序列数据。我们引入了几个新功能,包括策略文件,允许用户保存和重用他们喜欢的一组选项。策略文件可以上传到 NCBI BLAST 网站并从该网站下载。
结论
与当前的 BLAST 工具相比,新的 BLAST 命令行应用程序在长查询以及染色体长度数据库序列方面都展示了显著的速度提升。我们还改进了命令行应用程序的用户界面。