• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于块索引的高效间距种子哈希计算。

Efficient computation of spaced seed hashing with block indexing.

机构信息

Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.

出版信息

BMC Bioinformatics. 2018 Nov 30;19(Suppl 15):441. doi: 10.1186/s12859-018-2415-8.

DOI:10.1186/s12859-018-2415-8
PMID:30497364
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6266934/
Abstract

BACKGROUND

Spaced-seeds, i.e. patterns in which some fixed positions are allowed to be wild-cards, play a crucial role in several bioinformatics applications involving substrings counting and indexing, by often providing better sensitivity with respect to k-mers based approaches. K-mers based approaches are usually fast, being based on efficient hashing and indexing that exploits the large overlap between consecutive k-mers. Spaced-seeds hashing is not as straightforward, and it is usually computed from scratch for each position in the input sequence. Recently, the FSH (Fast Spaced seed Hashing) approach was proposed to improve the time required for computation of the spaced seed hashing of DNA sequences with a speed-up of about 1.5 with respect to standard hashing computation.

RESULTS

In this work we propose a novel algorithm, Fast Indexing for Spaced seed Hashing (FISH), based on the indexing of small blocks that can be combined to obtain the hashing of spaced-seeds of any length. The method exploits the fast computation of the hashing of runs of consecutive 1 in the spaced seeds, that basically correspond to k-mer of the length of the run.

CONCLUSIONS

We run several experiments, on NGS data from simulated and synthetic metagenomic experiments, to assess the time required for the computation of the hashing for each position in each read with respect to several spaced seeds. In our experiments, FISH can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.9x to 6.03x, depending on the structure of the spaced seeds.

摘要

背景

间隔种子(即某些固定位置允许使用通配符的模式)在涉及子串计数和索引的几个生物信息学应用中起着至关重要的作用,通常相对于基于 k-mer 的方法提供更好的敏感性。基于 k-mer 的方法通常很快,因为它们基于利用连续 k-mer 之间的大量重叠的高效哈希和索引。间隔种子哈希并不那么直接,通常需要为输入序列中的每个位置从头开始计算。最近,提出了 FSH(Fast Spaced seed Hashing)方法来提高计算 DNA 序列间隔种子哈希的时间,相对于标准哈希计算,速度提高了约 1.5 倍。

结果

在这项工作中,我们提出了一种新的算法,即间隔种子哈希的快速索引(Fast Indexing for Spaced seed Hashing,FISH),该算法基于可以组合以获得任意长度间隔种子哈希的小块索引。该方法利用间隔种子中连续 1 位的哈希快速计算,这些位基本上对应于长度为该位的 k-mer。

结论

我们在模拟和合成宏基因组实验的 NGS 数据上进行了多次实验,以评估相对于几种间隔种子计算每个读取中每个位置的哈希计算所需的时间。在我们的实验中,FISH 可以相对于传统方法提供 1.9x 到 6.03x 的加速,具体取决于间隔种子的结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/ebbac4692ec3/12859_2018_2415_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/dc1b9ffa4df7/12859_2018_2415_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/9bf65cecd314/12859_2018_2415_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/c218872bb7c7/12859_2018_2415_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/f3b594fdf3b1/12859_2018_2415_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/7412fd9ec5d2/12859_2018_2415_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/dad01bda6c04/12859_2018_2415_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/6a479a6897c8/12859_2018_2415_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/ebbac4692ec3/12859_2018_2415_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/dc1b9ffa4df7/12859_2018_2415_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/9bf65cecd314/12859_2018_2415_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/c218872bb7c7/12859_2018_2415_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/f3b594fdf3b1/12859_2018_2415_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/7412fd9ec5d2/12859_2018_2415_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/dad01bda6c04/12859_2018_2415_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/6a479a6897c8/12859_2018_2415_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c13d/6266934/ebbac4692ec3/12859_2018_2415_Fig8_HTML.jpg

相似文献

1
Efficient computation of spaced seed hashing with block indexing.基于块索引的高效间距种子哈希计算。
BMC Bioinformatics. 2018 Nov 30;19(Suppl 15):441. doi: 10.1186/s12859-018-2415-8.
2
FSH: fast spaced seed hashing exploiting adjacent hashes.FSH:利用相邻哈希的快速间隔种子哈希
Algorithms Mol Biol. 2018 Mar 22;13:8. doi: 10.1186/s13015-018-0125-4. eCollection 2018.
3
Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and -mer Hashing.迭代间隔种子哈希:缩小间隔种子哈希与k-mer哈希之间的差距。
J Comput Biol. 2020 Feb;27(2):223-233. doi: 10.1089/cmb.2019.0298. Epub 2019 Dec 4.
4
MISSH: Fast Hashing of Multiple Spaced Seeds.
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2330-2339. doi: 10.1109/TCBB.2024.3467368. Epub 2024 Dec 10.
5
ntHash2: recursive spaced seed hashing for nucleotide sequences.ntHash2:核苷酸序列的递归间隔种子哈希。
Bioinformatics. 2022 Oct 14;38(20):4812-4813. doi: 10.1093/bioinformatics/btac564.
6
Efficient computation of spaced seeds.间隔种子的高效计算。
BMC Res Notes. 2012 Feb 28;5:123. doi: 10.1186/1756-0500-5-123.
7
Global, highly specific and fast filtering of alignment seeds.全局、高度特异且快速的比对种子过滤。
BMC Bioinformatics. 2022 Jun 10;23(1):225. doi: 10.1186/s12859-022-04745-4.
8
Effects of spaced k-mers on alignment-free genotyping.间隔 k-mer 对无比对基因分型的影响。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i213-i221. doi: 10.1093/bioinformatics/btad202.
9
Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search.优化间隔k-mer邻居以实现蛋白质相似性搜索中的高效筛选
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):398-406. doi: 10.1109/TCBB.2014.2306831.
10
PerFSeeB: designing long high-weight single spaced seeds for full sensitivity alignment with a given number of mismatches.PerFSeeB:设计长的高权重单间隔种子,以在给定数量的错配下实现全灵敏度比对。
BMC Bioinformatics. 2023 Oct 24;24(1):396. doi: 10.1186/s12859-023-05517-4.

引用本文的文献

1
OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier.章鱼:基于磁盘的、多平台、对移动设备友好的宏基因组分类器。
AMIA Annu Symp Proc. 2025 May 22;2024:798-807. eCollection 2024.
2
Taming large-scale genomic analyses via sparsified genomics.通过稀疏化基因组学实现大规模基因组分析的优化
Nat Commun. 2025 Jan 21;16(1):876. doi: 10.1038/s41467-024-55762-1.
3
OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier.章鱼:基于磁盘的、多平台、对移动设备友好的宏基因组学分类器。

本文引用的文献

1
FSH: fast spaced seed hashing exploiting adjacent hashes.FSH:利用相邻哈希的快速间隔种子哈希
Algorithms Mol Biol. 2018 Mar 22;13:8. doi: 10.1186/s13015-018-0125-4. eCollection 2018.
2
Alignment-free sequence comparison: benefits, applications, and tools.无比对信息的序列比对:优势、应用和工具。
Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7.
3
Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.11110110111的最佳命中结果:间隔种子的无模型选择和无参数敏感性计算
bioRxiv. 2024 Aug 10:2024.03.15.585215. doi: 10.1101/2024.03.15.585215.
4
Effects of spaced k-mers on alignment-free genotyping.间隔 k-mer 对无比对基因分型的影响。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i213-i221. doi: 10.1093/bioinformatics/btad202.
5
From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.从分子到基因组变异:通过智能算法和架构加速基因组分析
Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. doi: 10.1016/j.csbj.2022.08.019. eCollection 2022.
6
'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees.“多间隔词匹配法”:一种使用多个间隔词匹配和四重树进行系统发育重建的最大似然法。
NAR Genom Bioinform. 2019 Oct 30;2(1):lqz013. doi: 10.1093/nargab/lqz013. eCollection 2020 Mar.
Algorithms Mol Biol. 2017 Feb 14;12:1. doi: 10.1186/s13015-017-0092-1. eCollection 2017.
4
Efficient Algorithms for Sequence Analysis with Entropic Profiles.基于熵轮廓的序列分析高效算法。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):117-128. doi: 10.1109/TCBB.2016.2620143. Epub 2016 Oct 21.
5
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。
Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.
6
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.拉斯巴里:优化间隔种子用于数据库搜索、读段映射和无比对序列比较
PLoS Comput Biol. 2016 Oct 19;12(10):e1005107. doi: 10.1371/journal.pcbi.1005107. eCollection 2016 Oct.
7
MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.MetaProb:基于概率序列特征的准确宏基因组 reads 分箱
Bioinformatics. 2016 Sep 1;32(17):i567-i575. doi: 10.1093/bioinformatics/btw466.
8
Fast genotyping of known SNPs through approximate k-mer matching.通过近似k-mer匹配对已知单核苷酸多态性进行快速基因分型。
Bioinformatics. 2016 Sep 1;32(17):i538-i544. doi: 10.1093/bioinformatics/btw460.
9
Higher classification sensitivity of short metagenomic reads with CLARK-S.使用CLARK-S时短宏基因组读数具有更高的分类敏感性。
Bioinformatics. 2016 Dec 15;32(24):3823-3825. doi: 10.1093/bioinformatics/btw542. Epub 2016 Aug 18.
10
ntHash: recursive nucleotide hashing.ntHash:递归核苷酸哈希
Bioinformatics. 2016 Nov 15;32(22):3492-3494. doi: 10.1093/bioinformatics/btw397. Epub 2016 Jul 16.