Compression of nucleotide databases for fast searching.

Suppr

超能文献

作者信息

Williams H, Zobel J

机构信息

Department of Computer Science, RMIT, Melbourne, Australia.

出版信息

Comput Appl Biosci. 1997 Oct;13(5):549-54. doi: 10.1093/bioinformatics/13.5.549.

DOI:10.1093/bioinformatics/13.5.549

PMID:9367128

Abstract

MOTIVATION

International sequencing efforts are creating huge nucleotide databases, which are used in searching applications to locate sequences homologous to a query sequence. In such applications, it is desirable that databases are stored compactly, that sequences can be accessed independently of the order in which they were stored, and that data can be rapidly retrieved from secondary storage, since disk costs are often the bottleneck in searching.

RESULTS

We present a purpose-built direct coding scheme for fast retrieval and compression of genomic nucleotide data. The scheme is lossless, readily integrated with sequence search tools, and does not require a model. Direct coding gives good compression and allows faster retrieval than with either uncompressed data or data compressed by other methods, thus yielding significant improvements in search times for high-speed homology search tools.

摘要

相似文献

Compression of nucleotide databases for fast searching.

Comput Appl Biosci. 1997 Oct;13(5):549-54. doi: 10.1093/bioinformatics/13.5.549.

Protein structural similarity search by Ramachandran codes.通过拉马钱德兰编码进行蛋白质结构相似性搜索。

BMC Bioinformatics. 2007 Aug 23;8:307. doi: 10.1186/1471-2105-8-307.

Comparing compressed sequences for faster nucleotide BLAST searches.比较压缩序列以进行更快的核苷酸BLAST搜索。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):349-64. doi: 10.1109/TCBB.2007.1029.

SSAHA: a fast search method for large DNA databases.SSAHA：一种用于大型DNA数据库的快速搜索方法。

Genome Res. 2001 Oct;11(10):1725-9. doi: 10.1101/gr.194201.

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper：用于在Linux集群上进行相似性搜索的一组包装应用程序。

BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.

General-purpose search techniques for genomic text.基因组文本的通用搜索技术。

Genome Inform. 2004;15(2):42-51.

Issues in searching molecular sequence databases.搜索分子序列数据库中的问题。

Nat Genet. 1994 Feb;6(2):119-29. doi: 10.1038/ng0294-119.

Query-dependent banding (QDB) for faster RNA similarity searches.用于更快RNA相似性搜索的查询依赖条带法（QDB）。

PLoS Comput Biol. 2007 Mar 30;3(3):e56. doi: 10.1371/journal.pcbi.0030056. Epub 2007 Feb 7.

Using relational databases for improved sequence similarity searching and large-scale genomic analyses.使用关系型数据库来改进序列相似性搜索和大规模基因组分析。

Curr Protoc Bioinformatics. 2004 Oct;Chapter 9:Unit 9.4. doi: 10.1002/0471250953.bi0904s7.

Flexible sequence similarity searching with the FASTA3 program package.使用FASTA3程序包进行灵活的序列相似性搜索。

Methods Mol Biol. 2000;132:185-219. doi: 10.1385/1-59259-192-2:185.

引用本文的文献

Bitpacking techniques for indexing genomes: I. Hash tables.用于基因组索引的位包装技术：I. 哈希表

Algorithms Mol Biol. 2016 Apr 18;11:5. doi: 10.1186/s13015-016-0069-5. eCollection 2016.

Data structures and compression algorithms for genomic sequence data.用于基因组序列数据的数据结构和压缩算法。

Bioinformatics. 2009 Jul 15;25(14):1731-8. doi: 10.1093/bioinformatics/btp319. Epub 2009 May 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验