一种用于预测映射质量的串联模拟框架。

A tandem simulation framework for predicting mapping quality.

作者信息

Langmead Ben

机构信息

Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, 3400 North Charles St, Baltimore, 21218-2682, USA.

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 N Wolfe St, Baltimore, 21205, USA.

出版信息

Genome Biol. 2017 Aug 10;18(1):152. doi: 10.1186/s13059-017-1290-3.

DOI:10.1186/s13059-017-1290-3

PMID:28806977

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5557537/

Abstract

Read alignment is the first step in most sequencing data analyses. Because a read's point of origin can be ambiguous, aligners report a mapping quality, which is the probability that the reported alignment is incorrect. Despite its importance, there is no established and general method for calculating mapping quality. I describe a framework for predicting mapping qualities that works by simulating a set of tandem reads. These are like the input reads in important ways, but the true point of origin is known. I implement this method in an accurate and low-overhead tool called Qtip, which is compatible with popular aligners.

摘要

读取比对是大多数测序数据分析的第一步。由于读取的起源点可能不明确，比对工具会报告一个映射质量，即所报告的比对不正确的概率。尽管其很重要，但目前尚无既定的通用方法来计算映射质量。我描述了一个通过模拟一组串联读取来预测映射质量的框架。这些串联读取在重要方面与输入读取相似，但真实的起源点是已知的。我在一个名为Qtip的准确且低开销的工具中实现了此方法，该工具与流行的比对工具兼容。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0f2/5557537/45e814fd6228/13059_2017_1290_Fig1_HTML.jpg

相似文献

A tandem simulation framework for predicting mapping quality.一种用于预测映射质量的串联模拟框架。

Genome Biol. 2017 Aug 10;18(1):152. doi: 10.1186/s13059-017-1290-3.

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征，对多种新一代测序比对器的读段比对进行评估。

Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.

Incorporating sequence quality data into alignment improves DNA read mapping.将序列质量数据纳入比对可提高 DNA 读取的映射质量。

Nucleic Acids Res. 2010 Apr;38(7):e100. doi: 10.1093/nar/gkq010. Epub 2010 Jan 27.

Optimal spliced alignments of short sequence reads.短序列 reads 的最优剪接比对。

Bioinformatics. 2008 Aug 15;24(16):i174-80. doi: 10.1093/bioinformatics/btn300.

TruSPAdes: barcode assembly of TruSeq synthetic long reads.TruSPAdes：TruSeq 合成长 reads 的条码组装。

Nat Methods. 2016 Mar;13(3):248-50. doi: 10.1038/nmeth.3737. Epub 2016 Feb 1.

Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.高通量测序中使用的映射算法比较：应用于Ion Torrent数据

BMC Genomics. 2014 Apr 5;15:264. doi: 10.1186/1471-2164-15-264.

Mapping short DNA sequencing reads and calling variants using mapping quality scores.使用比对质量分数比对短DNA测序读数并识别变异。

Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.

RF: a method for filtering short reads with tandem repeats for genome mapping.RF：一种用于基因组图谱构建的带有串联重复的短读过滤方法。

Genomics. 2013 Jul;102(1):35-7. doi: 10.1016/j.ygeno.2013.03.002. Epub 2013 Mar 29.

Review of alignment and SNP calling algorithms for next-generation sequencing data.下一代测序数据的比对和单核苷酸多态性（SNP）检测算法综述。

J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.

引用本文的文献

Oarfish: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.皇带鱼：增强的概率模型可提高长读长转录组定量的准确性。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i304-i313. doi: 10.1093/bioinformatics/btaf240.

SigAlign: an alignment algorithm guided by explicit similarity criteria.SigAlign：一种基于显式相似性标准的对齐算法。

Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.

Short-read aligner performance in germline variant identification.短读比对工具在种系变异识别中的性能表现。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad480.

Pathogenic strains of contain plasmids that are absent in the probiotic strain Pdp11.含有与益生菌 Pdp11 菌株中不存在的质粒相关的致病菌株。

PeerJ. 2022 Oct 24;10:e14248. doi: 10.7717/peerj.14248. eCollection 2022.

Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities.功能基因类别区分玉米叶干旱相关微生物附生群落。

PLoS One. 2020 Sep 18;15(9):e0237493. doi: 10.1371/journal.pone.0237493. eCollection 2020.

Vargas: heuristic-free alignment for assessing linear and graph read aligners.瓦尔加斯：用于评估线性和图形读取对齐程序的无启发式对齐。

Bioinformatics. 2020 Jun 1;36(12):3712-3718. doi: 10.1093/bioinformatics/btaa265.

Multimapping confounds ribosome profiling analysis: A case-study of the Hsp90 molecular chaperone.多映射混淆核糖体分析：热休克蛋白 90 分子伴侣的案例研究。

Proteins. 2020 Jan;88(1):57-68. doi: 10.1002/prot.25766. Epub 2019 Jul 19.

NGSEP3: accurate variant calling across species and sequencing protocols.NGSEP3：跨物种和测序协议的准确变异调用。

Bioinformatics. 2019 Nov 1;35(22):4716-4723. doi: 10.1093/bioinformatics/btz275.

Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples.现代和古代样本杂合度和纯合度的联合估计。

Genetics. 2019 Jul;212(3):587-614. doi: 10.1534/genetics.119.302057. Epub 2019 May 14.

FORGe: prioritizing variants for graph genomes.FORGe：对图基因组中的变体进行优先级排序。

Genome Biol. 2018 Dec 17;19(1):220. doi: 10.1186/s13059-018-1595-x.

本文引用的文献

A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree.通过对一个包含17名成员的三代家系进行测序，经遗传继承验证的540万个定相人类变异的参考数据集。

Genome Res. 2017 Jan;27(1):157-164. doi: 10.1101/gr.210500.116. Epub 2016 Nov 30.

A haplotype-based normalization technique for the analysis and detection of allele specific expression.一种基于单倍型的归一化技术，用于等位基因特异性表达的分析和检测。

BMC Bioinformatics. 2016 Sep 13;17(1):364. doi: 10.1186/s12859-016-1238-8.

Assemblytics: a web analytics tool for the detection of variants from an assembly.Assemblytics：一种用于从组装中检测变异的网络分析工具。

Bioinformatics. 2016 Oct 1;32(19):3021-3. doi: 10.1093/bioinformatics/btw369. Epub 2016 Jun 17.

Alignment of Next-Generation Sequencing Reads.下一代测序读数的比对

Annu Rev Genomics Hum Genet. 2015;16:133-51. doi: 10.1146/annurev-genom-090413-025358. Epub 2015 May 4.

Toward better understanding of artifacts in variant calling from high-coverage samples.为了更好地理解高覆盖样本中变体调用中的伪影。

Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27.

MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.MOSAIK：一种基于哈希的算法，用于精确的下一代测序短读段比对。

PLoS One. 2014 Mar 5;9(3):e90581. doi: 10.1371/journal.pone.0090581. eCollection 2014.

Specificity control for read alignments using an artificial reference genome-guided false discovery rate.使用人工参考基因组指导的假发现率控制读对齐的特异性。

Bioinformatics. 2014 Jan 1;30(1):9-16. doi: 10.1093/bioinformatics/btt255. Epub 2013 May 17.

STAR: ultrafast universal RNA-seq aligner.STAR：超快通用 RNA-seq 对齐工具。

Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

Accurate estimation of short read mapping quality for next-generation genome sequencing.准确估计下一代基因组测序中短读测序数据的映射质量。

Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.

Comment on "Widespread RNA and DNA sequence differences in the human transcriptome".评论“人类转录组中广泛存在的 RNA 和 DNA 序列差异”。

Science. 2012 Mar 16;335(6074):1302; author reply 1302. doi: 10.1126/science.1210484.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于预测映射质量的串联模拟框架。

A tandem simulation framework for predicting mapping quality.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献