Suppr超能文献

BFAST:用于大规模基因组重测序的比对工具。

BFAST: an alignment tool for large scale genome resequencing.

机构信息

Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.

出版信息

PLoS One. 2009 Nov 11;4(11):e7767. doi: 10.1371/journal.pone.0007767.

Abstract

BACKGROUND

The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.

METHODOLOGY

We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.

CONCLUSIONS

We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net).

摘要

背景

新一代的大规模平行 DNA 测序仪,结合人类全基因组重测序的挑战,需要快速准确地将数十亿个短 DNA 序列读取到一个大型参考基因组中。速度显然非常重要,但同样重要的是在存在错误和真实生物变异的情况下,保持短读(25-100 个碱基范围)的对齐精度。

方法

我们引入了一种专门针对此任务优化的新算法,以及一个免费的实现 BFAST,它可以对齐当前任何测序平台产生的数据,允许用户自定义速度和准确性级别,支持配对端数据,并提供高效的并行和多线程计算在计算机集群上。新方法基于创建灵活、高效的全基因组索引,快速将读取映射到候选对齐位置,允许任意多个独立索引以实现对读取错误和序列变体的稳健性。最终的局部比对使用 Smith-Waterman 方法,带有间隙以支持小插入缺失的检测。

结论

我们比较了 BFAST 和一系列大规模比对工具——BLAT、MAQ、SHRiMP 和 SOAP——在速度和准确性方面,使用模拟和真实数据集。我们表明,BFAST 在存在错误和真实变体(尤其是插入和缺失)的情况下,可以实现更高的对齐灵敏度,并最小化错误映射,同时与其他当前方法相比保持足够的速度。我们表明,BFAST 可以在不到 24 小时的时间内,在适度的计算机集群上,以高灵敏度和准确性对齐完全重测序人类基因组所需的数据量,即 10 亿个读取。BFAST 可在(http://bfast.sourceforge.net)获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e0b/2770639/bcc8d9ce74da/pone.0007767.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验