Suppr超能文献

通过对子序列进行聚类实现更快的序列同源性搜索。

Faster sequence homology searches by clustering subsequences.

作者信息

Suzuki Shuji, Kakuta Masanori, Ishida Takashi, Akiyama Yutaka

机构信息

Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan.

Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan.

出版信息

Bioinformatics. 2015 Apr 15;31(8):1183-90. doi: 10.1093/bioinformatics/btu780. Epub 2014 Nov 27.

Abstract

MOTIVATION

Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis.

RESULTS

We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX.

AVAILABILITY AND IMPLEMENTATION

The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/

CONTACT

akiyama@cs.titech.ac.jp

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

序列同源性搜索在各个领域都有应用。新的测序技术产生了大量的序列数据,这不断增加了序列数据库的规模。因此,同源性搜索需要大量的计算时间,特别是对于宏基因组分析。

结果

我们开发了一种基于数据库子序列聚类的快速同源性搜索方法,并将其实现为GHOSTZ。该方法对数据库中的相似子序列进行聚类,通过基于三角不等式减少比对候选来执行高效的种子搜索和无间隙扩展。数据库子序列聚类技术在不大幅降低搜索灵敏度的情况下实现了约2倍的速度提升。当我们用宏基因组数据进行测量时,GHOSTZ比RAPSearch快约2.2 - 2.8倍,比BLASTX快约185 - 261倍。

可用性和实现方式

源代码可在http://www.bi.cs.titech.ac.jp/ghostz/免费下载。

联系方式

akiyama@cs.titech.ac.jp

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d33/4393512/11b933f77075/btu780f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验