将未映射的 reads 与碱基质量得分重新比对。

Re-alignment of the unmapped reads with base quality score.

作者信息

Peng Xiaoqing, Wang Jianxin, Zhang Zhen, Xiao Qianghua, Li Min, Pan Yi

出版信息

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-16-S5-S8. Epub 2015 Mar 18.

DOI:10.1186/1471-2105-16-S5-S8

PMID:25860434

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4402702/

Abstract

MOTIVATION

Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis.

RESULTS

This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing.

AVAILABILITY

All source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html.

摘要

动机

基于下一代基因组测序技术，开发了各种生物学应用，而比对是获得测序读段后的第一步。近年来，已经开发了许多软件工具来高效、准确地将短读段与参考基因组进行比对。然而，由于允许的错配数超标，仍有许多读段无法映射到参考基因组。此外，除了未映射的读段外，映射质量低的读段也被排除在下游分析之外，例如变异检测。如果我们能够利用这些读段的可靠片段，不仅可以提高比对率，还能为下游分析提供更多信息。

结果

本文提出了一种名为RAUR（重新比对未映射读段）的方法，用于重新比对那些无法被比对工具映射的读段。首先，它利用测序仪报告的碱基质量分数，通过控制比对中可能的错配数来找出未映射读段中最可靠和信息量最大的片段。然后，结合一个比对工具，RAUR对这些读段片段进行重新比对。我们在不同读长的模拟数据和真实数据上运行RAUR。结果表明，许多无法被最流行的比对工具（BWA和Bowtie2）比对的读段可以被RAUR正确重新比对，且精度相似。即使与BWA-MEM和Bowtie2的局部模式（对长读段进行局部比对以提高比对率）相比，RAUR在某些情况下在比对率和精度上也具有优势。因此，RAUR中使用的修剪策略有助于提高下一代基因组测序比对工具的比对率。

可用性

所有源代码可在http://netlab.csu.edu.cn/bioinformatics/RAUR.html获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5bb/4402702/e8f1ba360ab1/1471-2105-16-S5-S8-1.jpg

相似文献

Re-alignment of the unmapped reads with base quality score.将未映射的 reads 与碱基质量得分重新比对。

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-16-S5-S8. Epub 2015 Mar 18.

Accurate estimation of short read mapping quality for next-generation genome sequencing.准确估计下一代基因组测序中短读测序数据的映射质量。

Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.

Performance evaluation method for read mapping tool in clinical panel sequencing.临床Panel测序中读段比对工具的性能评估方法

Genes Genomics. 2018;40(2):189-197. doi: 10.1007/s13258-017-0621-9. Epub 2017 Nov 9.

Short Sequence Aligner Benchmarking for Chromatin Research.短序列比对工具在染色质研究中的基准测试。

Int J Mol Sci. 2023 Sep 14;24(18):14074. doi: 10.3390/ijms241814074.

A fast read alignment method based on seed-and-vote for next generation sequencing.一种基于种子与投票的用于下一代测序的快速读段比对方法。

BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):466. doi: 10.1186/s12859-016-1329-6.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

SRPRISM (Single Read Paired Read Indel Substitution Minimizer): an efficient aligner for assemblies with explicit guarantees.SRPRISM（单读配对读插入缺失替换最小化器）：具有明确保证的组装的高效对齐器。

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa023.

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征，对多种新一代测序比对器的读段比对进行评估。

Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.

Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.调整校正器可提高多物种序列数据的准确性并减少计算时间。

Microb Genom. 2017 Jul 8;3(9):e000122. doi: 10.1099/mgen.0.000122. eCollection 2017 Sep.

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.SOAP3：基于 GPU 的超快速短读序列并行比对工具。

Bioinformatics. 2012 Mar 15;28(6):878-9. doi: 10.1093/bioinformatics/bts061. Epub 2012 Jan 28.

引用本文的文献

Unlocking the Potential of Animal Hair Shafts for Genomic Studies: A Comprehensive Evaluation of DNA Quality.释放动物毛干用于基因组研究的潜力：DNA质量的全面评估

Biology (Basel). 2025 Mar 28;14(4):353. doi: 10.3390/biology14040353.

Deciphering Microbial Communities and Distinct Metabolic Pathways in the Tangyin Hydrothermal Fields of Okinawa Trough through Metagenomic and Genomic Analyses.通过宏基因组和基因组分析解析冲绳海槽汤阴热液区的微生物群落及独特代谢途径

Microorganisms. 2024 Mar 4;12(3):517. doi: 10.3390/microorganisms12030517.

Benchmarking DNA methylation analysis of 14 alignment algorithms for whole genome bisulfite sequencing in mammals.哺乳动物全基因组亚硫酸氢盐测序的14种比对算法的DNA甲基化分析基准测试

Comput Struct Biotechnol J. 2022 Aug 27;20:4704-4716. doi: 10.1016/j.csbj.2022.08.051. eCollection 2022.

kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the -Mer Neighborhood Graph.kngMap：基于-mer邻域图的针对噪声长读段的灵敏且快速的映射算法

Front Genet. 2022 May 5;13:890651. doi: 10.3389/fgene.2022.890651. eCollection 2022.

Baiting out a full length sequence from unmapped RNA-seq data.从未映射的 RNA-seq 数据中钓出全长序列。

BMC Genomics. 2021 Nov 27;22(1):857. doi: 10.1186/s12864-021-08146-4.

CAFU: a Galaxy framework for exploring unmapped RNA-Seq data.CAFU：一个用于探索未映射RNA测序数据的Galaxy框架。

Brief Bioinform. 2020 Mar 23;21(2):676-686. doi: 10.1093/bib/bbz018.

MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data.MetaMap：人类疾病相关 RNA-seq 数据中转录组数据的图谱。

Gigascience. 2018 Jun 1;7(6). doi: 10.1093/gigascience/giy070.

VAliBS: a visual aligner for bisulfite sequences.VAliBS：一种用于亚硫酸氢盐序列的可视化比对工具。

BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):410. doi: 10.1186/s12859-017-1827-1.

Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.牛及其他哺乳动物的基因组和转录组测序为家畜基因组学带来的启示。

Genet Sel Evol. 2016 Aug 17;48(1):59. doi: 10.1186/s12711-016-0237-6.

Two Independent Mutations in ADAMTS17 Are Associated with Primary Open Angle Glaucoma in the Basset Hound and Basset Fauve de Bretagne Breeds of Dog.ADAMTS17基因中的两个独立突变与巴吉度猎犬和布列塔尼浅黄褐色巴吉度犬品种的原发性开角型青光眼相关。

PLoS One. 2015 Oct 16;10(10):e0140436. doi: 10.1371/journal.pone.0140436. eCollection 2015.

本文引用的文献

EPGA: de novo assembly using the distributions of reads and insert size.EPGA：基于读长和插入片段分布的从头组装。

Bioinformatics. 2015 Mar 15;31(6):825-33. doi: 10.1093/bioinformatics/btu762. Epub 2014 Nov 17.

The application of next generation sequencing in DNA methylation analysis.下一代测序在 DNA 甲基化分析中的应用。

Genes (Basel). 2010 Jun 4;1(1):85-101. doi: 10.3390/genes1010085.

Accurate estimation of short read mapping quality for next-generation genome sequencing.准确估计下一代基因组测序中短读测序数据的映射质量。

Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.

ART: a next-generation sequencing read simulator.ART：一种新一代测序读模拟程序。

Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.

Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。

Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.

Advances in understanding cancer genomes through second-generation sequencing.通过第二代测序技术深入了解癌症基因组。

Nat Rev Genet. 2010 Oct;11(10):685-96. doi: 10.1038/nrg2841.

A survey of sequence alignment algorithms for next-generation sequencing.下一代测序序列比对算法综述。

Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11.

Personalized copy number and segmental duplication maps using next-generation sequencing.使用下一代测序技术构建个性化拷贝数和片段重复图谱。

Nat Genet. 2009 Oct;41(10):1061-7. doi: 10.1038/ng.437. Epub 2009 Aug 30.

SOAP2: an improved ultrafast tool for short read alignment.SOAP2：一种用于短读序列比对的改进型超快速工具。

Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将未映射的 reads 与碱基质量得分重新比对。

Re-alignment of the unmapped reads with base quality score.

作者信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献