基于全基因组特征，对多种新一代测序比对器的读段比对进行评估。

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.

作者信息

Thankaswamy-Kosalai Subazini, Sen Partho, Nookaew Intawat

机构信息

Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Göteborg, Sweden.

Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Göteborg, Sweden; Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA.

出版信息

Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.

DOI:10.1016/j.ygeno.2017.03.001

PMID:28286147

Abstract

Massive data produced due to the advent of next-generation sequencing (NGS) technology is widely used for biological researches and medical diagnosis. The crucial step in NGS analysis is read alignment or mapping which is computationally intensive and complex. The mapping bias tends to affect the downstream analysis, including detection of polymorphisms. In order to provide guidelines to the biologist for suitable selection of aligners; we have evaluated and benchmarked 5 different aligners (BWA, Bowtie2, NovoAlign, Smalt and Stampy) and their mapping bias based on characteristics of 5 microbial genomes. Two million simulated read pairs of various sizes (36bp, 50bp, 72bp, 100bp, 125bp, 150bp, 200bp, 250bp and 300bp) were aligned. Specific alignment features such as sensitivity of mapping, percentage of properly paired reads, alignment time and effect of tandem repeats on incorrectly mapped reads were evaluated. BWA showed faster alignment followed by Bowtie2 and Smalt. NovoAlign and Stampy were comparatively slower. Most of the aligners showed high sensitivity towards long reads (>100bp) mapping. On the other hand NovoAlign showed higher sensitivity towards both short reads (36bp, 50bp, 72bp) and long reads (>100bp) mappings; It also showed higher sensitivity towards mapping a complex genome like Plasmodium falciparum. The percentage of properly paired reads aligned by NovoAlign, BWA and Stampy were markedly higher. None of the aligners outperforms the others in the benchmark, however the aligners perform differently with genome characteristics. We expect that the results from this study will be useful for the end user to choose aligner, thus enhance the accuracy of read mapping.

摘要

由于下一代测序（NGS）技术的出现而产生的海量数据被广泛应用于生物学研究和医学诊断。NGS分析中的关键步骤是读段比对或映射，这在计算上既密集又复杂。映射偏差往往会影响下游分析，包括多态性检测。为了为生物学家选择合适的比对工具提供指导；我们基于5个微生物基因组的特征，对5种不同的比对工具（BWA、Bowtie2、NovoAlign、Smalt和Stampy）及其映射偏差进行了评估和基准测试。对两百万个不同大小（36bp、50bp、72bp、100bp、125bp、150bp、200bp、250bp和300bp）的模拟读段对进行了比对。评估了特定的比对特征，如映射的敏感性、正确配对读段的百分比、比对时间以及串联重复对错误映射读段的影响。BWA的比对速度更快，其次是Bowtie2和Smalt。NovoAlign和Stampy相对较慢。大多数比对工具对长读段（>100bp）映射表现出高敏感性。另一方面，NovoAlign对短读段（36bp、50bp、72bp）和长读段（>100bp）映射都表现出更高的敏感性；它对像恶性疟原虫这样的复杂基因组映射也表现出更高的敏感性。由NovoAlign、BWA和Stampy比对的正确配对读段的百分比明显更高。在基准测试中，没有一个比对工具比其他工具表现更优，然而不同的比对工具在基因组特征方面表现不同。我们期望这项研究的结果将有助于终端用户选择比对工具，从而提高读段映射的准确性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于全基因组特征，对多种新一代测序比对器的读段比对进行评估。

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

基于全基因组特征，对多种新一代测序比对器的读段比对进行评估。

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献