• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AlignerBoost:一种基于贝叶斯映射质量框架提高下一代测序映射准确性的通用软件工具包。

AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

作者信息

Zheng Qi, Grice Elizabeth A

机构信息

Department of Dermatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

出版信息

PLoS Comput Biol. 2016 Oct 5;12(10):e1005096. doi: 10.1371/journal.pcbi.1005096. eCollection 2016 Oct.

DOI:10.1371/journal.pcbi.1005096
PMID:27706155
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5051939/
Abstract

Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

摘要

将新一代测序(NGS)读段准确比对到参考基因组对于几乎所有NGS应用和下游分析都至关重要。人类及其他高等真核生物基因组中的各种重复元件在很大程度上导致了读段比对模糊(非唯一)。大多数现有的NGS比对器试图通过要么去除所有非唯一比对的读段,要么基于简单启发式方法报告一个随机或“最佳”比对结果来解决这个问题。因此,准确估计NGS读段的比对质量至关重要,尽管目前完全缺乏相关方法。在此,我们开发了一个通用软件工具包“AlignerBoost”,它利用基于贝叶斯的框架来准确估计模糊比对的NGS读段的比对质量。我们在不同阈值下使用模拟和真实的DNA测序及RNA测序数据集对AlignerBoost进行了测试。在大多数情况下,特别是对于落在重复区域内的读段,AlignerBoost显著提高了现代NGS比对器的比对精度,即使在没有比对质量过滤的情况下也不会显著降低灵敏度。当使用更高的比对质量截止值时,AlignerBoost实现了更低的错误比对率,同时与比对器默认模式相比表现出相当或更高的灵敏度,因此即使使用极端阈值也能显著提高NGS比对器的检测能力。AlignerBoost还能识别单核苷酸多态性(SNP),如果提供已知的SNP,则可以实现更高质量的比对。AlignerBoost的算法计算效率高,在典型的台式计算机上30秒内可以处理100万个比对。AlignerBoost作为一个统一的Java应用程序实现,可在https://github.com/Grice-Lab/AlignerBoost上免费获取。

相似文献

1
AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.AlignerBoost:一种基于贝叶斯映射质量框架提高下一代测序映射准确性的通用软件工具包。
PLoS Comput Biol. 2016 Oct 5;12(10):e1005096. doi: 10.1371/journal.pcbi.1005096. eCollection 2016 Oct.
2
Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.序列深度不变,测序量更少:解决模糊映射读取的贝叶斯方法。
PLoS Comput Biol. 2021 Apr 19;17(4):e1008926. doi: 10.1371/journal.pcbi.1008926. eCollection 2021 Apr.
3
Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征,对多种新一代测序比对器的读段比对进行评估。
Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.
4
Fast and SNP-aware short read alignment with SALT.基于 SALT 的快速 SNP 感知短读序列比对。
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.
5
ASElux: an ultra-fast and accurate allelic reads counter.ASElux:一种超快速且准确的等位基因读取计数器。
Bioinformatics. 2018 Apr 15;34(8):1313-1320. doi: 10.1093/bioinformatics/btx762.
6
Review of alignment and SNP calling algorithms for next-generation sequencing data.下一代测序数据的比对和单核苷酸多态性(SNP)检测算法综述。
J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.
7
RASER: reads aligner for SNPs and editing sites of RNA.RASER:RNA单核苷酸多态性和编辑位点的读段比对工具。
Bioinformatics. 2015 Dec 15;31(24):3906-13. doi: 10.1093/bioinformatics/btv505. Epub 2015 Aug 30.
8
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.Gencore:一种高效的工具,用于生成共识读数,以抑制 NGS 数据的错误并去除重复。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9.
9
Kart: a divide-and-conquer algorithm for NGS read alignment.Kart:一种用于二代测序读段比对的分治算法。
Bioinformatics. 2017 Aug 1;33(15):2281-2287. doi: 10.1093/bioinformatics/btx189.
10
AUSPP: A universal short-read pre-processing package.AUSPP:一个通用的短读长预处理程序包。
J Bioinform Comput Biol. 2019 Dec;17(6):1950037. doi: 10.1142/S0219720019500379.

引用本文的文献

1
Involucrin Modulates Vitamin D Receptor Activity in the Epidermis.桥粒芯糖蛋白调节表皮中维生素 D 受体的活性。
J Invest Dermatol. 2023 Jun;143(6):1052-1061.e3. doi: 10.1016/j.jid.2022.12.009. Epub 2023 Jan 13.
2
Commensal microbiota regulates skin barrier function and repair via signaling through the aryl hydrocarbon receptor.共生微生物群落通过芳烃受体信号通路调节皮肤屏障功能和修复。
Cell Host Microbe. 2021 Aug 11;29(8):1235-1248.e8. doi: 10.1016/j.chom.2021.05.011. Epub 2021 Jul 1.
3
Improving draft genome contiguity with reference-derived in silico mate-pair libraries.

本文引用的文献

1
BatAlign: an incremental method for accurate alignment of sequencing reads.BatAlign:一种用于测序读段精确比对的增量方法。
Nucleic Acids Res. 2015 Sep 18;43(16):e107. doi: 10.1093/nar/gkv533. Epub 2015 Jul 13.
2
Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing.靶向 RNA 测序的长非编码 RNA 的定量基因谱分析。
Nat Methods. 2015 Apr;12(4):339-42. doi: 10.1038/nmeth.3321. Epub 2015 Mar 9.
3
Genome-wide discovery of human splicing branchpoints.全基因组范围内人类剪接分支点的发现。
利用参考序列衍生的虚拟同型配对文库提高基因组草图连续性。
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy029.
4
Performance evaluation method for read mapping tool in clinical panel sequencing.临床Panel测序中读段比对工具的性能评估方法
Genes Genomics. 2018;40(2):189-197. doi: 10.1007/s13258-017-0621-9. Epub 2017 Nov 9.
5
Commensal microbiota modulate gene expression in the skin.共生微生物调节皮肤中的基因表达。
Microbiome. 2018 Jan 30;6(1):20. doi: 10.1186/s40168-018-0404-9.
Genome Res. 2015 Feb;25(2):290-303. doi: 10.1101/gr.182899.114. Epub 2015 Jan 5.
4
Performance comparison of four exome capture systems for deep sequencing.四种外显子捕获系统用于深度测序的性能比较
BMC Genomics. 2014 Jun 9;15(1):449. doi: 10.1186/1471-2164-15-449.
5
mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications.FAST-Ultra 软件:一款用于高性能测序应用的紧凑、SNP 感知型映射器。
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W494-500. doi: 10.1093/nar/gku370. Epub 2014 May 8.
6
SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data.SInC:一种准确且快速的基于错误模型的 SNP、Indel 和 CNV 模拟器,结合了用于短读序列数据的读取生成器。
BMC Bioinformatics. 2014 Feb 5;15:40. doi: 10.1186/1471-2105-15-40.
7
XS: a FASTQ read simulator.XS:一款FASTQ读取模拟器。
BMC Res Notes. 2014 Jan 16;7:40. doi: 10.1186/1756-0500-7-40.
8
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.featureCounts:一个用于将序列读取分配给基因组特征的高效通用程序。
Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13.
9
TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.TopHat2:在存在插入、缺失和基因融合的情况下对转录组进行精确比对。
Genome Biol. 2013 Apr 25;14(4):R36. doi: 10.1186/gb-2013-14-4-r36.
10
SRmapper: a fast and sensitive genome-hashing alignment tool.SRmapper:一种快速且灵敏的基因组哈希比对工具。
Bioinformatics. 2013 Feb 1;29(3):316-21. doi: 10.1093/bioinformatics/bts712. Epub 2012 Dec 24.