• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

S-conLSH:无比对的含噪长读段映射

S-conLSH: alignment-free gapped mapping of noisy long reads.

机构信息

Department of Computer Science, West Bengal Education Service, Kolkata, India.

Department of Bioinformatics (IMG), University of Göttingen, 37077, Göttingen, Germany.

出版信息

BMC Bioinformatics. 2021 Feb 11;22(1):64. doi: 10.1186/s12859-020-03918-3.

DOI:10.1186/s12859-020-03918-3
PMID:33573603
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7879691/
Abstract

BACKGROUND

The advancement of SMRT technology has unfolded new opportunities of genome analysis with its longer read length and low GC bias. Alignment of the reads to their appropriate positions in the respective reference genome is the first but costliest step of any analysis pipeline based on SMRT sequencing. However, the state-of-the-art aligners often fail to identify distant homologies due to lack of conserved regions, caused by frequent genetic duplication and recombination. Therefore, we developed a novel alignment-free method of sequence mapping that is fast and accurate.

RESULTS

We present a new mapper called S-conLSH that uses Spaced context based Locality Sensitive Hashing. With multiple spaced patterns, S-conLSH facilitates a gapped mapping of noisy long reads to the corresponding target locations of a reference genome. We have examined the performance of the proposed method on 5 different real and simulated datasets. S-conLSH is at least 2 times faster than the recently developed method lordFAST. It achieves a sensitivity of 99%, without using any traditional base-to-base alignment, on human simulated sequence data. By default, S-conLSH provides an alignment-free mapping in PAF format. However, it has an option of generating aligned output as SAM-file, if it is required for any downstream processing.

CONCLUSIONS

S-conLSH is one of the first alignment-free reference genome mapping tools achieving a high level of sensitivity. The spaced-context is especially suitable for extracting distant similarities. The variable-length spaced-seeds or patterns add flexibility to the proposed algorithm by introducing gapped mapping of the noisy long reads. Therefore, S-conLSH may be considered as a prominent direction towards alignment-free sequence analysis.

摘要

背景

SMRT 技术的发展为基因组分析带来了新的机遇,其具有更长的读取长度和低 GC 偏倚。在任何基于 SMRT 测序的分析管道中,将读取与相应参考基因组中的适当位置对齐是第一步,但也是最昂贵的步骤。然而,由于缺乏保守区域,最先进的对齐器经常无法识别远距离同源性,这是由于频繁的遗传重复和重组造成的。因此,我们开发了一种新的快速而准确的无比对序列映射方法。

结果

我们提出了一种新的称为 S-conLSH 的映射器,它使用基于间隔上下文的局部敏感哈希。通过多个间隔模式,S-conLSH 可以实现嘈杂的长读取与参考基因组的相应目标位置的有间隙映射。我们已经在 5 个不同的真实和模拟数据集上检验了该方法的性能。S-conLSH 比最近开发的 lordFAST 方法至少快 2 倍。在人类模拟序列数据上,它在不使用任何传统的基于碱基的比对的情况下,实现了 99%的敏感性。默认情况下,S-conLSH 以 PAF 格式提供无比对映射。但是,如果需要进行任何下游处理,它具有生成对齐输出作为 SAM 文件的选项。

结论

S-conLSH 是第一个实现高灵敏度的无比对参考基因组映射工具之一。间隔上下文特别适合提取远距离相似性。可变长度的间隔种子或模式通过引入嘈杂的长读取的有间隙映射,为所提出的算法增加了灵活性。因此,S-conLSH 可以被视为无比对序列分析的一个重要方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c1d/7879691/0e31fb5edc6c/12859_2020_3918_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c1d/7879691/36ef92b305e3/12859_2020_3918_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c1d/7879691/0e31fb5edc6c/12859_2020_3918_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c1d/7879691/36ef92b305e3/12859_2020_3918_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c1d/7879691/0e31fb5edc6c/12859_2020_3918_Fig2_HTML.jpg

相似文献

1
S-conLSH: alignment-free gapped mapping of noisy long reads.S-conLSH:无比对的含噪长读段映射
BMC Bioinformatics. 2021 Feb 11;22(1):64. doi: 10.1186/s12859-020-03918-3.
2
conLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads.conLSH:用于嘈杂单分子实时测序(SMRT)读段映射的基于上下文的局部敏感哈希算法
Comput Biol Chem. 2020 Apr;85:107206. doi: 10.1016/j.compbiolchem.2020.107206. Epub 2020 Jan 18.
3
rHAT: fast alignment of noisy long reads with regional hashing.rHAT:基于区域哈希的快速对齐含噪长读。
Bioinformatics. 2016 Jun 1;32(11):1625-31. doi: 10.1093/bioinformatics/btv662. Epub 2015 Nov 14.
4
Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征,对多种新一代测序比对器的读段比对进行评估。
Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.
5
HISEA: HIerarchical SEed Aligner for PacBio data.HISEA:用于PacBio数据的分层种子比对器。
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.
6
lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.lordFAST:用于长噪声测序数据的敏感快速比对搜索工具。
Bioinformatics. 2019 Jan 1;35(1):20-27. doi: 10.1093/bioinformatics/bty544.
7
pathMap: a path-based mapping tool for long noisy reads with high sensitivity.路径图:一种基于路径的长噪声读取高灵敏度映射工具。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.
8
Fast and SNP-aware short read alignment with SALT.基于 SALT 的快速 SNP 感知短读序列比对。
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.
9
Fast and Accurate Algorithms for Mapping and Aligning Long Reads.快速准确的长读映射和对齐算法。
J Comput Biol. 2021 Aug;28(8):789-803. doi: 10.1089/cmb.2020.0603. Epub 2021 Jun 23.
10
Meta-aligner: long-read alignment based on genome statistics.Meta比对器:基于基因组统计信息的长读段比对。
BMC Bioinformatics. 2017 Feb 23;18(1):126. doi: 10.1186/s12859-017-1518-y.

引用本文的文献

1
Taming large-scale genomic analyses via sparsified genomics.通过稀疏化基因组学实现大规模基因组分析的优化
Nat Commun. 2025 Jan 21;16(1):876. doi: 10.1038/s41467-024-55762-1.
2
Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions.序列比对对蛋白质亚细胞定位预测准确性的影响。
Proteins. 2025 Mar;93(3):745-759. doi: 10.1002/prot.26767. Epub 2024 Nov 22.
3
Benchmarking long-read genome sequence alignment tools for human genomics applications.用于人类基因组学应用的长读长基因组序列比对工具的基准测试。

本文引用的文献

1
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。
Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.
2
PatternHunter II: highly sensitive and fast homology search.模式搜索器II:高度灵敏且快速的同源性搜索。
Genome Inform. 2003;14:164-75.
PeerJ. 2023 Dec 18;11:e16515. doi: 10.7717/peerj.16515. eCollection 2023.
4
A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。
Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.
5
BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis.BLEND:一种在基因组分析中快速、节省内存且准确地查找模糊种子匹配项的机制。
NAR Genom Bioinform. 2023 Jan 20;5(1):lqad004. doi: 10.1093/nargab/lqad004. eCollection 2023 Mar.
6
From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.从分子到基因组变异:通过智能算法和架构加速基因组分析
Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. doi: 10.1016/j.csbj.2022.08.019. eCollection 2022.
7
kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the -Mer Neighborhood Graph.kngMap:基于-mer邻域图的针对噪声长读段的灵敏且快速的映射算法
Front Genet. 2022 May 5;13:890651. doi: 10.3389/fgene.2022.890651. eCollection 2022.
8
Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。
Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.