Suzuki Shuji, Kakuta Masanori, Ishida Takashi, Akiyama Yutaka
Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan.
Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan.
Bioinformatics. 2015 Apr 15;31(8):1183-90. doi: 10.1093/bioinformatics/btu780. Epub 2014 Nov 27.
Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis.
We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX.
The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/
Supplementary data are available at Bioinformatics online.
序列同源性搜索在各个领域都有应用。新的测序技术产生了大量的序列数据,这不断增加了序列数据库的规模。因此,同源性搜索需要大量的计算时间,特别是对于宏基因组分析。
我们开发了一种基于数据库子序列聚类的快速同源性搜索方法,并将其实现为GHOSTZ。该方法对数据库中的相似子序列进行聚类,通过基于三角不等式减少比对候选来执行高效的种子搜索和无间隙扩展。数据库子序列聚类技术在不大幅降低搜索灵敏度的情况下实现了约2倍的速度提升。当我们用宏基因组数据进行测量时,GHOSTZ比RAPSearch快约2.2 - 2.8倍,比BLASTX快约185 - 261倍。
源代码可在http://www.bi.cs.titech.ac.jp/ghostz/免费下载。
补充数据可在《生物信息学》在线获取。