• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PerFSeeB:设计长的高权重单间隔种子,以在给定数量的错配下实现全灵敏度比对。

PerFSeeB: designing long high-weight single spaced seeds for full sensitivity alignment with a given number of mismatches.

机构信息

School of Biological Sciences, University of Manchester, Oxford Road, Manchester, M13 9PL, UK.

School of Mathematics, University of Leeds, Woodhouse, Leeds, LS2 9JT, UK.

出版信息

BMC Bioinformatics. 2023 Oct 24;24(1):396. doi: 10.1186/s12859-023-05517-4.

DOI:10.1186/s12859-023-05517-4
PMID:37875804
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10594774/
Abstract

BACKGROUND

Technical progress in computational hardware allows researchers to use new approaches for sequence alignment problems. For a given sequence, we usually use smaller subsequences (anchors) to find possible candidate positions within a reference sequence. We may create pairs ("position", "subsequence") for the reference sequence and keep all such records without compression, even on a budget computer. As sequences for new and reference genomes differ, the goal is to find anchors, so we tolerate differences and keep the number of candidate positions with the same anchors to a minimum. Spaced seeds (masks ignoring symbols at specific locations) are a way to approach the task. An ideal (full sensitivity) spaced seed should enable us to find all such positions subject to a given maximum number of mismatches permitted.

RESULTS

Several algorithms to assist seed generation are presented. The first one finds all permitted spaced seeds iteratively. We observe specific patterns for the seeds of the highest weight. There are often periodic seeds with a simple relation between block size, length of the seed and read. The second algorithm produces blocks for periodic seeds for blocks of up to 50 symbols and up to nine mismatches. The third algorithm uses those lists to find spaced seeds for reads of an arbitrary length. Finally, we apply seeds to a real dataset and compare results for other popular seeds.

CONCLUSIONS

PerFSeeB approach helps to significantly reduce the number of reads' possible alignment positions for a known number of mismatches. Lists of long, high-weight spaced seeds are available in Additional file 1. The seeds are best in weight compared to seeds from other papers and can usually be applied to shorter reads. Codes for all algorithms and periodic blocks can be found at https://github.com/vtman/PerFSeeB .

摘要

背景

计算硬件技术的进步使得研究人员能够使用新方法来解决序列比对问题。对于给定的序列,我们通常使用较小的子序列(锚)在参考序列中找到可能的候选位置。我们可以为参考序列创建“位置”和“子序列”对,并保留所有这些记录,而无需进行压缩,即使在预算有限的计算机上也是如此。由于新序列和参考基因组的序列不同,因此目标是找到锚,因此我们容忍差异并将具有相同锚的候选位置数量保持在最小。间隔种子(忽略特定位置符号的掩码)是一种解决该任务的方法。理想的(全灵敏度)间隔种子应该使我们能够找到所有满足给定最大允许错配数的位置。

结果

本文提出了几种辅助种子生成的算法。第一种算法迭代地找到所有允许的间隔种子。我们观察到最高权重种子的特定模式。通常存在具有简单块大小、种子长度和读取之间关系的周期性种子。第二种算法生成最多 50 个符号和最多 9 个错配的周期性种子块。第三种算法使用这些列表为任意长度的读取找到间隔种子。最后,我们将种子应用于真实数据集,并将结果与其他流行的种子进行比较。

结论

PerFSeeB 方法有助于在已知错配数的情况下显著减少已知数量的读取的可能对齐位置的数量。较长、高权重的间隔种子列表可在附加文件 1 中获得。与其他论文中的种子相比,这些种子的权重最好,通常可以应用于较短的读取。所有算法和周期性块的代码可在 https://github.com/vtman/PerFSeeB 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/732f301e7336/12859_2023_5517_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/1995f594d263/12859_2023_5517_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/b0bd728e7d95/12859_2023_5517_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/76586a0cc8d8/12859_2023_5517_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/8466e02b331f/12859_2023_5517_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/66bd76fa7b1e/12859_2023_5517_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/d4976a1785f3/12859_2023_5517_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/9a4b383e8268/12859_2023_5517_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/68d096d161e4/12859_2023_5517_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/732f301e7336/12859_2023_5517_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/1995f594d263/12859_2023_5517_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/b0bd728e7d95/12859_2023_5517_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/76586a0cc8d8/12859_2023_5517_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/8466e02b331f/12859_2023_5517_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/66bd76fa7b1e/12859_2023_5517_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/d4976a1785f3/12859_2023_5517_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/9a4b383e8268/12859_2023_5517_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/68d096d161e4/12859_2023_5517_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eacc/10594774/732f301e7336/12859_2023_5517_Fig9_HTML.jpg

相似文献

1
PerFSeeB: designing long high-weight single spaced seeds for full sensitivity alignment with a given number of mismatches.PerFSeeB:设计长的高权重单间隔种子,以在给定数量的错配下实现全灵敏度比对。
BMC Bioinformatics. 2023 Oct 24;24(1):396. doi: 10.1186/s12859-023-05517-4.
2
PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds.PerM:具有周期性全敏感间隔种子的短测序 reads 的高效映射。
Bioinformatics. 2009 Oct 1;25(19):2514-21. doi: 10.1093/bioinformatics/btp486. Epub 2009 Aug 12.
3
Multiple spaced seeds for homology search.用于同源性搜索的多个间隔种子。
Bioinformatics. 2007 Nov 15;23(22):2969-77. doi: 10.1093/bioinformatics/btm422. Epub 2007 Sep 5.
4
Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters.使用多间隔种子和多索引布隆过滤器进行容错、无比对的序列分类。
Proc Natl Acad Sci U S A. 2020 Jul 21;117(29):16961-16968. doi: 10.1073/pnas.1903436117. Epub 2020 Jul 8.
5
Efficient computation of spaced seed hashing with block indexing.基于块索引的高效间距种子哈希计算。
BMC Bioinformatics. 2018 Nov 30;19(Suppl 15):441. doi: 10.1186/s12859-018-2415-8.
6
Superiority of spaced seeds for homology search.间隔种子在同源性搜索中的优势。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):496-505. doi: 10.1109/tcbb.2007.1013.
7
S-conLSH: alignment-free gapped mapping of noisy long reads.S-conLSH:无比对的含噪长读段映射
BMC Bioinformatics. 2021 Feb 11;22(1):64. doi: 10.1186/s12859-020-03918-3.
8
Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA:一种用于基因组读取比对的并行稀疏索引。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.
9
FSH: fast spaced seed hashing exploiting adjacent hashes.FSH:利用相邻哈希的快速间隔种子哈希
Algorithms Mol Biol. 2018 Mar 22;13:8. doi: 10.1186/s13015-018-0125-4. eCollection 2018.
10
Multiple seeds sensitivity using a single seed with threshold.使用具有阈值的单个种子点的多种子点敏感性
J Bioinform Comput Biol. 2015 Aug;13(4):1550011. doi: 10.1142/S0219720015500110. Epub 2015 Feb 3.

本文引用的文献

1
Minimally overlapping words for sequence similarity search.用于序列相似性搜索的最小重叠词。
Bioinformatics. 2021 Apr 1;36(22-23):5344-5350. doi: 10.1093/bioinformatics/btaa1054.
2
The International Genome Sample Resource (IGSR) collection of open human genomic variation resources.国际基因组样本资源(IGSR)汇集了开放的人类基因组变异资源。
Nucleic Acids Res. 2020 Jan 8;48(D1):D941-D947. doi: 10.1093/nar/gkz836.
3
Fast and accurate correction of optical mapping data via spaced seeds.通过间隔种子实现光学作图数据的快速准确校正。
Bioinformatics. 2020 Feb 1;36(3):682-689. doi: 10.1093/bioinformatics/btz663.
4
Reviving the Transcriptome Studies: An Insight Into the Emergence of Single-Molecule Transcriptome Sequencing.复兴转录组学研究:洞察单分子转录组测序的兴起
Front Genet. 2019 Apr 26;10:384. doi: 10.3389/fgene.2019.00384. eCollection 2019.
5
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.拉斯巴里:优化间隔种子用于数据库搜索、读段映射和无比对序列比较
PLoS Comput Biol. 2016 Oct 19;12(10):e1005107. doi: 10.1371/journal.pcbi.1005107. eCollection 2016 Oct.
6
Fast and sensitive protein alignment using DIAMOND.使用 DIAMOND 进行快速灵敏的蛋白质比对。
Nat Methods. 2015 Jan;12(1):59-60. doi: 10.1038/nmeth.3176. Epub 2014 Nov 17.
7
Fast alignment-free sequence comparison using spaced-word frequencies.基于空位词频的快速无比对序列比较。
Bioinformatics. 2014 Jul 15;30(14):1991-9. doi: 10.1093/bioinformatics/btu177. Epub 2014 Apr 3.
8
Library construction for next-generation sequencing: overviews and challenges.下一代测序文库构建:概述与挑战。
Biotechniques. 2014 Feb 1;56(2):61-4, 66, 68, passim. doi: 10.2144/000114133. eCollection 2014.
9
Efficient computation of spaced seeds.间隔种子的高效计算。
BMC Res Notes. 2012 Feb 28;5:123. doi: 10.1186/1756-0500-5-123.
10
BFAST: an alignment tool for large scale genome resequencing.BFAST:用于大规模基因组重测序的比对工具。
PLoS One. 2009 Nov 11;4(11):e7767. doi: 10.1371/journal.pone.0007767.