朴素贝叶斯碱基识别：一种用于高通量测序的基于模型的高效碱基识别算法。

naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

作者信息

Kao Wei-Chun, Song Yun S

机构信息

Department of EECS, University of California, Berkeley, California, USA.

出版信息

J Comput Biol. 2011 Mar;18(3):365-77. doi: 10.1089/cmb.2010.0247.

DOI:10.1089/cmb.2010.0247

PMID:21385040

Abstract

Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base-calling algorithm that is fully parametric and has several advantages over previously proposed methods. Our original algorithm, called BayesCall, significantly reduced the error rate, particularly in the later cycles of a sequencing run, and also produced useful base-specific quality scores with a high discrimination ability. Unfortunately, however, BayesCall is too computationally expensive to be of broad practical use. In this article, we build on our previous model-based approach to devise an efficient base-calling algorithm that is orders of magnitude faster than BayesCall, while still maintaining a comparably high level of accuracy. Our new algorithm is called naive-BayesCall, and it utilizes approximation and optimization methods to achieve scalability. We describe the performance of naiveBayesCall and demonstrate how improved base-calling accuracy may facilitate de novo assembly and SNP detection when the sequence coverage depth is low to moderate.

摘要

目前，使用超高通量测序平台正在生成海量的原始仪器数据（即荧光图像）。与这一快速发展相关的一个重要计算挑战是开发能够从原始数据中提取准确序列信息的高效算法。为应对这一挑战，我们最近引入了一种基于模型的新型碱基识别算法，该算法完全是参数化的，并且相对于先前提出的方法具有多个优点。我们最初的算法称为BayesCall，它显著降低了错误率，尤其是在测序运行的后期循环中，并且还产生了具有高辨别能力的有用的碱基特异性质量得分。然而，不幸的是，BayesCall的计算成本过高，无法广泛实际应用。在本文中，我们基于先前基于模型的方法，设计了一种高效的碱基识别算法，该算法比BayesCall快几个数量级，同时仍保持相当高的准确性。我们的新算法称为朴素BayesCall，它利用近似和优化方法来实现可扩展性。我们描述了朴素BayesCall的性能，并展示了在序列覆盖深度为低到中等时，提高的碱基识别准确性如何促进从头组装和单核苷酸多态性（SNP）检测。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

朴素贝叶斯碱基识别：一种用于高通量测序的基于模型的高效碱基识别算法。

naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

朴素贝叶斯碱基识别：一种用于高通量测序的基于模型的高效碱基识别算法。

naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献