Suppr超能文献

AlignerBoost:一种基于贝叶斯映射质量框架提高下一代测序映射准确性的通用软件工具包。

AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

作者信息

Zheng Qi, Grice Elizabeth A

机构信息

Department of Dermatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

出版信息

PLoS Comput Biol. 2016 Oct 5;12(10):e1005096. doi: 10.1371/journal.pcbi.1005096. eCollection 2016 Oct.

Abstract

Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

摘要

将新一代测序(NGS)读段准确比对到参考基因组对于几乎所有NGS应用和下游分析都至关重要。人类及其他高等真核生物基因组中的各种重复元件在很大程度上导致了读段比对模糊(非唯一)。大多数现有的NGS比对器试图通过要么去除所有非唯一比对的读段,要么基于简单启发式方法报告一个随机或“最佳”比对结果来解决这个问题。因此,准确估计NGS读段的比对质量至关重要,尽管目前完全缺乏相关方法。在此,我们开发了一个通用软件工具包“AlignerBoost”,它利用基于贝叶斯的框架来准确估计模糊比对的NGS读段的比对质量。我们在不同阈值下使用模拟和真实的DNA测序及RNA测序数据集对AlignerBoost进行了测试。在大多数情况下,特别是对于落在重复区域内的读段,AlignerBoost显著提高了现代NGS比对器的比对精度,即使在没有比对质量过滤的情况下也不会显著降低灵敏度。当使用更高的比对质量截止值时,AlignerBoost实现了更低的错误比对率,同时与比对器默认模式相比表现出相当或更高的灵敏度,因此即使使用极端阈值也能显著提高NGS比对器的检测能力。AlignerBoost还能识别单核苷酸多态性(SNP),如果提供已知的SNP,则可以实现更高质量的比对。AlignerBoost的算法计算效率高,在典型的台式计算机上30秒内可以处理100万个比对。AlignerBoost作为一个统一的Java应用程序实现,可在https://github.com/Grice-Lab/AlignerBoost上免费获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验