Suppr超能文献

TotalReCaller:通过集成的对准和碱基调用提高准确性和性能。

TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.

机构信息

Computer Science Department, Courant Institute, New York University, NY 10012, USA.

出版信息

Bioinformatics. 2011 Sep 1;27(17):2330-7. doi: 10.1093/bioinformatics/btr393. Epub 2011 Jun 30.

Abstract

MOTIVATION

Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available.

RESULTS

Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling.

AVAILABILITY

A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/

CONTACT

fabian.menges@nyu.edu.

摘要

动机

目前,重新测序方法使用多个模块串行处理来自下一代测序平台的原始测序数据,而在最终对齐步骤之前,仍然对基因组信息一无所知。这种方法无法充分利用原始测序数据和参考基因组的全部信息,从而无法生成质量更好的序列读数、SNP 调用、变异检测,以及在参考基因组中尽可能最佳的位置进行对齐。因此,需要新的基于参考的生物信息学算法来解释代表碱基(A、C、G、T)序列的模拟信号,同时在可用时将可能的序列读数与源参考基因组对齐。

结果

在这里,我们提出了一种新的碱基调用算法 TotalRecall,以实现改进的性能。将原始强度数据的线性误差模型与基于 Burrows-Wheeler 变换(BWT)的对齐相结合,利用贝叶斯得分函数进行组合,然后使用高效的分支和边界方法在所有可能的基因组位置上进行全局优化。该算法已在软、硬件[现场可编程门阵列(FPGA)]中实现,以实现实时性能。使用真实的高通量 Illumina 数据进行的实证结果用于根据几个标准评估 TotalRecall 相对于其同行的性能,特别是在临床和科学应用中很重要的标准。即,评估了它的碱基调用速度和吞吐量、读取准确性以及在变异调用中的特异性和灵敏度。

可用性

TotalRecall 的软件实现以及其他信息可在以下网址获得:http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/

联系方式

fabian.menges@nyu.edu.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验