• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于短读长映射的准确性。

On the accuracy of short read mapping.

作者信息

Menzel Peter, Frellsen Jes, Plass Mireya, Rasmussen Simon H, Krogh Anders

机构信息

Department of Biology, The Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark.

出版信息

Methods Mol Biol. 2013;1038:39-59. doi: 10.1007/978-1-62703-514-9_3.

DOI:10.1007/978-1-62703-514-9_3
PMID:23872968
Abstract

The development of high-throughput sequencing technologies has revolutionized the way we study genomes and gene regulation. In a single experiment, millions of reads are produced. To gain knowledge from these experiments the first thing to be done is finding the genomic origin of the reads, i.e., mapping the reads to a reference genome. In this new situation, conventional alignment tools are obsolete, as they cannot handle this huge amount of data in a reasonable amount of time. Thus, new mapping algorithms have been developed, which are fast at the expense of a small decrease in accuracy. In this chapter we discuss the current problems in short read mapping and show that mapping reads correctly is a nontrivial task. Through simple experiments with both real and synthetic data, we demonstrate that different mappers can give different results depending on the type of data, and that a considerable fraction of uniquely mapped reads is potentially mapped to an incorrect location. Furthermore, we provide simple statistical results on the expected number of random matches in a genome (E-value) and the probability of a random match as a function of read length. Finally, we show that quality scores contain valuable information for mapping and why mapping quality should be evaluated in a probabilistic manner. In the end, we discuss the potential of improving the performance of current methods by considering these quality scores in a probabilistic mapping program.

摘要

高通量测序技术的发展彻底改变了我们研究基因组和基因调控的方式。在单次实验中,会产生数百万条读数。为了从这些实验中获取知识,首先要做的是确定读数的基因组来源,即把读数映射到参考基因组上。在这种新情况下,传统的比对工具已过时,因为它们无法在合理时间内处理如此大量的数据。因此,已开发出新型映射算法,这些算法速度快,但准确性略有下降。在本章中,我们讨论短读映射中的当前问题,并表明正确映射读数并非易事。通过对真实数据和合成数据进行简单实验,我们证明不同的映射器根据数据类型可能会给出不同的结果,并且相当一部分唯一映射的读数可能被映射到错误的位置。此外,我们提供了关于基因组中随机匹配预期数量(E值)以及随机匹配概率与读长函数关系的简单统计结果。最后,我们展示质量得分包含用于映射的有价值信息,以及为何应以概率方式评估映射质量。最后,我们讨论在概率映射程序中考虑这些质量得分来提高当前方法性能的潜力。

相似文献

1
On the accuracy of short read mapping.关于短读长映射的准确性。
Methods Mol Biol. 2013;1038:39-59. doi: 10.1007/978-1-62703-514-9_3.
2
RF: a method for filtering short reads with tandem repeats for genome mapping.RF:一种用于基因组图谱构建的带有串联重复的短读过滤方法。
Genomics. 2013 Jul;102(1):35-7. doi: 10.1016/j.ygeno.2013.03.002. Epub 2013 Mar 29.
3
Assessing the impact of exact reads on reducing the error rate of read mapping.评估精确读取对降低读取映射错误率的影响。
BMC Bioinformatics. 2018 Nov 6;19(1):406. doi: 10.1186/s12859-018-2432-7.
4
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.高通量测序中使用的映射算法比较:应用于Ion Torrent数据
BMC Genomics. 2014 Apr 5;15:264. doi: 10.1186/1471-2164-15-264.
5
SRmapper: a fast and sensitive genome-hashing alignment tool.SRmapper:一种快速且灵敏的基因组哈希比对工具。
Bioinformatics. 2013 Feb 1;29(3):316-21. doi: 10.1093/bioinformatics/bts712. Epub 2012 Dec 24.
6
An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.利用来自小型真核生物基因组的模拟读数对单核苷酸多态性假阳性原因的调查。
BMC Bioinformatics. 2015 Nov 11;16:382. doi: 10.1186/s12859-015-0801-z.
7
An enrichment method for mapping ambiguous reads to the reference genome for NGS analysis.一种用于将二代测序分析中的模糊读段映射到参考基因组的富集方法。
J Bioinform Comput Biol. 2019 Dec;17(6):1940012. doi: 10.1142/S0219720019400122.
8
Detection of structural variants involving repetitive regions in the reference genome.检测参考基因组中涉及重复区域的结构变异。
J Comput Biol. 2014 Mar;21(3):219-33. doi: 10.1089/cmb.2013.0129. Epub 2014 Feb 19.
9
Accurate estimation of short read mapping quality for next-generation genome sequencing.准确估计下一代基因组测序中短读测序数据的映射质量。
Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.
10
Short read mapping for exome sequencing.外显子组测序的短读长比对
Methods Mol Biol. 2013;1038:93-111. doi: 10.1007/978-1-62703-514-9_6.

引用本文的文献

1
Chromosome-level reference genome for European flat oyster ( L.).欧洲平牡蛎(L.)的染色体水平参考基因组
Evol Appl. 2022 Aug 22;15(11):1713-1729. doi: 10.1111/eva.13460. eCollection 2022 Nov.
2
Unravelling the tumour genome: The evolutionary and clinical impacts of structural variants in tumourigenesis.解析肿瘤基因组:结构变异在肿瘤发生中的进化和临床影响。
J Pathol. 2022 Jul;257(4):479-493. doi: 10.1002/path.5901. Epub 2022 Apr 28.
3
Calibrating Seed-Based Heuristics to Map Short Reads With Sesame.校准基于种子的启发式算法以使用Sesame映射短读段
Front Genet. 2020 Jun 25;11:572. doi: 10.3389/fgene.2020.00572. eCollection 2020.
4
Highly accessible AU-rich regions in 3' untranslated regions are hotspots for binding of regulatory factors.3'非翻译区中高度易接近的富含AU区域是调节因子结合的热点。
PLoS Comput Biol. 2017 Apr 14;13(4):e1005460. doi: 10.1371/journal.pcbi.1005460. eCollection 2017 Apr.
5
Evaluation of microRNA alignment techniques.微小RNA比对技术的评估
RNA. 2016 Aug;22(8):1120-38. doi: 10.1261/rna.055509.115. Epub 2016 Jun 9.