下一代测序读段比对算法的比较分析。
Comparative analysis of algorithms for next-generation sequencing read alignment.
机构信息
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA.
出版信息
Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.
MOTIVATION
The advent of next-generation sequencing (NGS) techniques presents many novel opportunities for many applications in life sciences. The vast number of short reads produced by these techniques, however, pose significant computational challenges. The first step in many types of genomic analysis is the mapping of short reads to a reference genome, and several groups have developed dedicated algorithms and software packages to perform this function. As the developers of these packages optimize their algorithms with respect to various considerations, the relative merits of different software packages remain unclear. However, for scientists who generate and use NGS data for their specific research projects, an important consideration is choosing the software that is most suitable for their application.
RESULTS
With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels and coverage. We also develop criteria to compare the performances of software with disparate output structure (e.g. some packages return a single alignment while some return multiple possible alignments). Using these criteria, we comprehensively evaluate the performances of Bowtie, BWA, mr- and mrsFAST, Novoalign, SHRiMP and SOAPv2, with regard to accuracy and runtime.
CONCLUSION
We expect that the results presented here will be useful to investigators in choosing the alignment software that is most suitable for their specific research aims. Our results also provide insights into the factors that should be considered to use alignment results effectively. Seal can also be used to evaluate the performance of algorithms that use deep sequencing data for various purposes (e.g. identification of genomic variants).
AVAILABILITY
Seal is available as open source at http://compbio.case.edu/seal/.
CONTACT
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
动机
下一代测序(NGS)技术的出现为生命科学的许多应用带来了许多新的机会。然而,这些技术产生的大量短读序列给计算带来了巨大的挑战。许多类型的基因组分析的第一步是将短读序列映射到参考基因组上,并且已经有几个小组开发了专门的算法和软件包来执行此功能。随着这些包的开发人员针对各种因素优化其算法,不同软件包的相对优势仍不清楚。但是,对于那些为其特定研究项目生成和使用 NGS 数据的科学家来说,一个重要的考虑因素是选择最适合其应用的软件。
结果
为了比较现有的短读序列比对软件,我们开发了一个模拟和评估套件 Seal,它可以模拟不同配置的 NGS 运行,包括测序错误、插入缺失和覆盖度。我们还开发了一些标准来比较具有不同输出结构的软件的性能(例如,一些包返回一个单一的比对结果,而有些包返回多个可能的比对结果)。使用这些标准,我们全面评估了 Bowtie、BWA、mr- 和 mrsFAST、Novoalign、SHRiMP 和 SOAPv2 的性能,包括准确性和运行时间。
结论
我们希望这里呈现的结果将有助于研究人员选择最适合其特定研究目标的比对软件。我们的结果还提供了有关使用比对结果的因素的见解,这些因素对于有效使用比对结果是至关重要的。Seal 还可用于评估针对各种目的(例如,基因组变异的鉴定)使用深度测序数据的算法的性能。
可用性
Seal 可在 http://compbio.case.edu/seal/ 上获得开源版本。
联系方式
补充信息
补充数据可在 Bioinformatics 在线获得。