• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在基因组序列上定位 reads:算法概述与实际比较分析

Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis.

作者信息

Schbath Sophie, Martin Véronique, Zytnicki Matthias, Fayolle Julien, Loux Valentin, Gibrat Jean-François

机构信息

INRA, UR1077 Unité Mathématique Informatique et Génome, Jouy-en-Josas, France.

出版信息

J Comput Biol. 2012 Jun;19(6):796-813. doi: 10.1089/cmb.2012.0022. Epub 2012 Apr 16.

DOI:10.1089/cmb.2012.0022
PMID:22506536
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3375638/
Abstract

Mapping short reads against a reference genome is classically the first step of many next-generation sequencing data analyses, and it should be as accurate as possible. Because of the large number of reads to handle, numerous sophisticated algorithms have been developped in the last 3 years to tackle this problem. In this article, we first review the underlying algorithms used in most of the existing mapping tools, and then we compare the performance of nine of these tools on a well controled benchmark built for this purpose. We built a set of reads that exist in single or multiple copies in a reference genome and for which there is no mismatch, and a set of reads with three mismatches. We considered as reference genome both the human genome and a concatenation of all complete bacterial genomes. On each dataset, we quantified the capacity of the different tools to retrieve all the occurrences of the reads in the reference genome. Special attention was paid to reads uniquely reported and to reads with multiple hits.

摘要

将短读长序列比对到参考基因组上通常是许多下一代测序数据分析的第一步,并且应该尽可能准确。由于需要处理大量的读长序列,在过去三年中已经开发了许多复杂的算法来解决这个问题。在本文中,我们首先回顾了大多数现有比对工具所使用的基础算法,然后我们在为此目的构建的一个严格控制的基准上比较了其中九种工具的性能。我们构建了一组在参考基因组中以单拷贝或多拷贝存在且无错配的读长序列,以及一组有三个错配的读长序列。我们将人类基因组和所有完整细菌基因组的串联序列都视为参考基因组。在每个数据集上,我们量化了不同工具检索参考基因组中读长序列所有出现情况的能力。特别关注了唯一报告的读长序列和有多个匹配的读长序列。

相似文献

1
Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis.在基因组序列上定位 reads:算法概述与实际比较分析
J Comput Biol. 2012 Jun;19(6):796-813. doi: 10.1089/cmb.2012.0022. Epub 2012 Apr 16.
2
Normal and compound poisson approximations for pattern occurrences in NGS reads.下一代测序(NGS)读段中模式出现的正态和复合泊松近似
J Comput Biol. 2012 Jun;19(6):839-54. doi: 10.1089/cmb.2012.0029.
3
Mapping short DNA sequencing reads and calling variants using mapping quality scores.使用比对质量分数比对短DNA测序读数并识别变异。
Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.
4
Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA:一种用于基因组读取比对的并行稀疏索引。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.
5
Re-alignment of the unmapped reads with base quality score.将未映射的 reads 与碱基质量得分重新比对。
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-16-S5-S8. Epub 2015 Mar 18.
6
SHRiMP: accurate mapping of short color-space reads.SHRiMP:短颜色空间读数的精确映射
PLoS Comput Biol. 2009 May;5(5):e1000386. doi: 10.1371/journal.pcbi.1000386. Epub 2009 May 22.
7
Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.单轮循环器:从短读长和长读长测序数据中解析细菌基因组组装结果
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.
8
GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads.GAPPadder:一种使用短序列读长来闭合草图基因组缺口的灵敏方法。
BMC Genomics. 2019 Jun 6;20(Suppl 5):426. doi: 10.1186/s12864-019-5703-4.
9
PSAT: a web tool to compare genomic neighborhoods of multiple prokaryotic genomes.PSAT:一个用于比较多个原核生物基因组的基因组邻域的网络工具。
BMC Bioinformatics. 2008 Mar 26;9:170. doi: 10.1186/1471-2105-9-170.
10
Assessing the impact of exact reads on reducing the error rate of read mapping.评估精确读取对降低读取映射错误率的影响。
BMC Bioinformatics. 2018 Nov 6;19(1):406. doi: 10.1186/s12859-018-2432-7.

引用本文的文献

1
Variation of and associations with the depth and evenness of sequencing coverage in archived plastid genomes.存档质体基因组中测序覆盖深度和均匀度的变化及其关联
Res Sq. 2025 Jul 14:rs.3.rs-5784537. doi: 10.21203/rs.3.rs-5784537/v1.
2
Aryana-bs: context-aware alignment of bisulfite-sequencing reads.Aryana-bs:亚硫酸氢盐测序读数的上下文感知比对
BMC Bioinformatics. 2025 Jul 21;26(1):188. doi: 10.1186/s12859-025-06182-5.
3
Variation of and associations with the depth and evenness of sequencing coverage in archived plastid genomes.存档质体基因组中测序覆盖深度和均匀性的变化及其关联
Sci Rep. 2025 Jul 19;15(1):26294. doi: 10.1038/s41598-025-11568-9.
4
Genomic analysis of DS-1-like human rotavirus A strains uncovers genetic relatedness of NSP4 gene with animal strains in Manhiça District, Southern Mozambique.对类似DS-1的人轮状病毒A株进行基因组分析,揭示了莫桑比克南部曼希卡区NSP4基因与动物株的遗传相关性。
Sci Rep. 2024 Dec 28;14(1):30705. doi: 10.1038/s41598-024-79767-4.
5
Polly: An R package for genotyping microsatellites and detecting highly polymorphic DNA markers from short-read data.波利:一个用于从短读数据中对微卫星进行基因分型和检测高度多态性 DNA 标记的 R 包。
Mol Ecol Resour. 2024 May;24(4):e13933. doi: 10.1111/1755-0998.13933. Epub 2024 Feb 1.
6
Genome-wide identification and annotation of SNPs for economically important traits in Frieswal™, newly evolved crossbred cattle of India.印度新培育的杂交牛品种弗里斯瓦尔(Frieswal™)经济重要性状单核苷酸多态性(SNPs)的全基因组鉴定与注释
3 Biotech. 2023 Sep;13(9):310. doi: 10.1007/s13205-023-03701-0. Epub 2023 Aug 22.
7
The Ultrafast and Accurate Mapping Algorithm FANSe3: Mapping a Human Whole-Genome Sequencing Dataset Within 30 Minutes.超快速且精确的映射算法FANSe3:在30分钟内完成人类全基因组测序数据集的映射
Phenomics. 2021 Feb 22;1(1):22-30. doi: 10.1007/s43657-020-00008-5. eCollection 2021 Feb.
8
Easy353: A Tool to Get Angiosperms353 Genes for Phylogenomic Research.Easy353:一个用于系统基因组学研究的被子植物 353 基因的工具。
Mol Biol Evol. 2022 Dec 5;39(12). doi: 10.1093/molbev/msac261.
9
Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.另一个来自未映射reads 的教训:对来自不同马组织的 RNA-Seq reads 的深度分析。
J Appl Genet. 2022 Sep;63(3):571-581. doi: 10.1007/s13353-022-00705-z. Epub 2022 Jun 7.
10
Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software.持续的软件开发,而不是引用数量或期刊选择,是准确生物信息学软件的指标。
Genome Biol. 2022 Feb 16;23(1):56. doi: 10.1186/s13059-022-02625-x.

本文引用的文献

1
Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。
Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.
2
SHRiMP2: sensitive yet practical SHort Read Mapping.SHRiMP2:敏感而实用的短读序列比对。
Bioinformatics. 2011 Apr 1;27(7):1011-2. doi: 10.1093/bioinformatics/btr046. Epub 2011 Jan 28.
3
Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.Stampy:一种用于 Illumina 序列读取的灵敏快速映射的统计算法。
Genome Res. 2011 Jun;21(6):936-9. doi: 10.1101/gr.111120.110. Epub 2010 Oct 27.
4
GASSST: global alignment short sequence search tool.GASSST:全局比对短序列搜索工具。
Bioinformatics. 2010 Oct 15;26(20):2534-40. doi: 10.1093/bioinformatics/btq485. Epub 2010 Aug 24.
5
BFAST: an alignment tool for large scale genome resequencing.BFAST:用于大规模基因组重测序的比对工具。
PLoS One. 2009 Nov 11;4(11):e7767. doi: 10.1371/journal.pone.0007767.
6
PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds.PerM:具有周期性全敏感间隔种子的短测序 reads 的高效映射。
Bioinformatics. 2009 Oct 1;25(19):2514-21. doi: 10.1093/bioinformatics/btp486. Epub 2009 Aug 12.
7
RazerS--fast read mapping with sensitivity control.RazerS——具有灵敏度控制的快速读取映射。
Genome Res. 2009 Sep;19(9):1646-54. doi: 10.1101/gr.088823.108. Epub 2009 Jul 10.
8
SOAP2: an improved ultrafast tool for short read alignment.SOAP2:一种用于短读序列比对的改进型超快速工具。
Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.
9
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.
10
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.