• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

kngMap:基于-mer邻域图的针对噪声长读段的灵敏且快速的映射算法

kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the -Mer Neighborhood Graph.

作者信息

Wei Ze-Gang, Fan Xing-Guo, Zhang Hao, Zhang Xiao-Dan, Liu Fei, Qian Yu, Zhang Shao-Wu

机构信息

Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China.

Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China.

出版信息

Front Genet. 2022 May 5;13:890651. doi: 10.3389/fgene.2022.890651. eCollection 2022.

DOI:10.3389/fgene.2022.890651
PMID:35601495
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9117619/
Abstract

With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel -mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage.

摘要

随着诸如PacBio单分子实时测序和牛津纳米孔测序等单分子测序(SMS)技术的快速发展,输出读长不断增加,这在前沿基因组应用方面具有巨大潜力。将这些读段比对到参考基因组通常是下游分析最基本且计算量最大的步骤。然而,这些长读段包含更高的测序错误,并且比短读段更频繁地跨越结构变异(SV)的断点,导致对于大多数最先进的比对工具来说,有许多未比对上的读段或部分比对上的读段。因此,这些方法通常专注于为查询读段生成局部比对结果,而不是获得完整的端到端比对。我们介绍了kngMap,一种基于新颖的-mer邻域图的比对工具,它专门设计用于将有噪声的长SMS读段比对到参考序列。通过在模拟和真实的SMS数据集上进行详尽实验来基准测试kngMap与其他十种流行的SMS比对工具(例如BLASR、BWA-MEM和minimap2)的性能,我们证明kngMap具有更高的灵敏度,能够将更多读段和碱基比对到参考基因组;同时,kngMap可以为整个读段生成连续比对,并跨越读段中不同类别的SV。kngMap用C++实现并支持多线程;kngMap的源代码可在https://github.com/zhang134/kngMap免费下载以供学术使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/a7547968d45e/fgene-13-890651-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/7f78942df40c/fgene-13-890651-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/444f9c36d34a/fgene-13-890651-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/282d09885d79/fgene-13-890651-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/ff198756f1c1/fgene-13-890651-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/a7547968d45e/fgene-13-890651-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/7f78942df40c/fgene-13-890651-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/444f9c36d34a/fgene-13-890651-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/282d09885d79/fgene-13-890651-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/ff198756f1c1/fgene-13-890651-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1489/9117619/a7547968d45e/fgene-13-890651-g005.jpg

相似文献

1
kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the -Mer Neighborhood Graph.kngMap:基于-mer邻域图的针对噪声长读段的灵敏且快速的映射算法
Front Genet. 2022 May 5;13:890651. doi: 10.3389/fgene.2022.890651. eCollection 2022.
2
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.smsMap:通过定位比对起始位置来对单分子测序reads 进行映射。
BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.
3
pathMap: a path-based mapping tool for long noisy reads with high sensitivity.路径图:一种基于路径的长噪声读取高灵敏度映射工具。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.
4
lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.lordFAST:用于长噪声测序数据的敏感快速比对搜索工具。
Bioinformatics. 2019 Jan 1;35(1):20-27. doi: 10.1093/bioinformatics/bty544.
5
Benchmarking long-read genome sequence alignment tools for human genomics applications.用于人类基因组学应用的长读长基因组序列比对工具的基准测试。
PeerJ. 2023 Dec 18;11:e16515. doi: 10.7717/peerj.16515. eCollection 2023.
6
invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.invMap:一种用于具有反转结构变体的长噪声读取的敏感映射工具。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.
7
LAMSA: fast split read alignment with long approximate matches.LAMSA:快速分裂读取比对算法,具有长近似匹配功能。
Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25.
8
HQAlign: aligning nanopore reads for SV detection using current-level modeling.HQAlign:使用电流水平建模对齐纳米孔读取以进行 SV 检测。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad580.
9
HQAlign: Aligning nanopore reads for SV detection using current-level modeling.HQAlign:使用电流水平建模对纳米孔读数进行比对以检测结构变异
bioRxiv. 2023 Jan 9:2023.01.08.523172. doi: 10.1101/2023.01.08.523172.
10
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory.使用带有连续精炼的基本局部比对(BLASR)对单分子测序reads 进行映射:应用与理论。
BMC Bioinformatics. 2012 Sep 19;13:238. doi: 10.1186/1471-2105-13-238.

引用本文的文献

1
pathMap: a path-based mapping tool for long noisy reads with high sensitivity.路径图:一种基于路径的长噪声读取高灵敏度映射工具。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.
2
invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.invMap:一种用于具有反转结构变体的长噪声读取的敏感映射工具。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.
3
A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。

本文引用的文献

1
EdClust: A heuristic sequence clustering method with higher sensitivity.EdClust:一种具有更高灵敏度的启发式序列聚类方法。
J Bioinform Comput Biol. 2022 Feb;20(1):2150036. doi: 10.1142/S0219720021500360. Epub 2021 Dec 23.
2
Technology dictates algorithms: recent developments in read alignment.技术决定算法:读段比对的最新进展。
Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.
3
lra: A long read aligner for sequences and contigs.lra:一种用于序列和重叠群的长读比对工具。
Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.
PLoS Comput Biol. 2021 Jun 21;17(6):e1009078. doi: 10.1371/journal.pcbi.1009078. eCollection 2021 Jun.
4
SKSV: ultrafast structural variation detection from circular consensus sequencing reads.SKSV:基于环形一致序列读取的超快速结构变异检测
Bioinformatics. 2021 Oct 25;37(20):3647-3649. doi: 10.1093/bioinformatics/btab341.
5
S-conLSH: alignment-free gapped mapping of noisy long reads.S-conLSH:无比对的含噪长读段映射
BMC Bioinformatics. 2021 Feb 11;22(1):64. doi: 10.1186/s12859-020-03918-3.
6
Efficient assembly of nanopore reads via highly accurate and intact error correction.通过高度准确和完整的纠错实现纳米孔读取的高效组装。
Nat Commun. 2021 Jan 4;12(1):60. doi: 10.1038/s41467-020-20236-7.
7
PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.PBSIM2:一种带有新型质量评分生成模型的长读测序模拟软件。
Bioinformatics. 2021 May 5;37(5):589-595. doi: 10.1093/bioinformatics/btaa835.
8
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.smsMap:通过定位比对起始位置来对单分子测序reads 进行映射。
BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.
9
GSAlign: an efficient sequence alignment tool for intra-species genomes.GSAlign:一种用于种内基因组的高效序列比对工具。
BMC Genomics. 2020 Feb 24;21(1):182. doi: 10.1186/s12864-020-6569-1.
10
conLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads.conLSH:用于嘈杂单分子实时测序(SMRT)读段映射的基于上下文的局部敏感哈希算法
Comput Biol Chem. 2020 Apr;85:107206. doi: 10.1016/j.compbiolchem.2020.107206. Epub 2020 Jan 18.