ntHash：递归核苷酸哈希

ntHash: recursive nucleotide hashing.

作者信息

Mohamadi Hamid, Chu Justin, Vandervalk Benjamin P, Birol Inanc

机构信息

Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada.

出版信息

Bioinformatics. 2016 Nov 15;32(22):3492-3494. doi: 10.1093/bioinformatics/btw397. Epub 2016 Jul 16.

DOI:10.1093/bioinformatics/btw397

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5181554/

Abstract

MOTIVATION

Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome and transcriptome assembly, k-mer counting and error correction. Hence, expediting hashing operations would have a substantial impact in the field, making bioinformatics applications faster and more efficient.

RESULTS

We present ntHash, a hashing algorithm tuned for processing DNA/RNA sequences. It performs the best when calculating hash values for adjacent k-mers in an input sequence, operating an order of magnitude faster than the best performing alternatives in typical use cases.

AVAILABILITY AND IMPLEMENTATION

ntHash is available online at http://www.bcgsc.ca/platform/bioinfo/software/nthash and is free for academic use.

CONTACTS

hmohamadi@bcgsc.ca or ibirol@bcgsc.caSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

哈希已广泛应用于许多生物信息学应用中的索引、查询和快速相似性搜索，包括序列比对、基因组和转录组组装、k-mer计数和错误校正。因此，加速哈希运算将对该领域产生重大影响，使生物信息学应用更快、更高效。

结果

我们提出了ntHash，一种针对处理DNA/RNA序列进行优化的哈希算法。在为输入序列中的相邻k-mer计算哈希值时，它表现最佳，在典型用例中比性能最佳的替代方案快一个数量级。

可用性和实现方式

ntHash可在http://www.bcgsc.ca/platform/bioinfo/software/nthash在线获取，供学术使用免费。

联系方式

hmohamadi@bcgsc.ca或ibirol@bcgsc.ca

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f914/5181554/18519c51fe8a/btw397f1p.jpg

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验