Department of Electrical and Computer Engineering, University of Texas, Austin, TX, US.
BMC Bioinformatics. 2012 Jul 9;13:160. doi: 10.1186/1471-2105-13-160.
Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and lengths of their reads, however, are yet to surpass those provided by the conventional Sanger sequencing method. This motivates the search for computationally efficient algorithms capable of reliable and accurate detection of the order of nucleotides in short DNA fragments from the acquired data.
In this paper, we consider Illumina's sequencing-by-synthesis platform which relies on reversible terminator chemistry and describe the acquired signal by reformulating its mathematical model as a Hidden Markov Model. Relying on this model and sequential Monte Carlo methods, we develop a parameter estimation and base calling scheme called ParticleCall. ParticleCall is tested on a data set obtained by sequencing phiX174 bacteriophage using Illumina's Genome Analyzer II. The results show that the developed base calling scheme is significantly more computationally efficient than the best performing unsupervised method currently available, while achieving the same accuracy.
The proposed ParticleCall provides more accurate calls than the Illumina's base calling algorithm, Bustard. At the same time, ParticleCall is significantly more computationally efficient than other recent schemes with similar performance, rendering it more feasible for high-throughput sequencing data analysis. Improvement of base calling accuracy will have immediate beneficial effects on the performance of downstream applications such as SNP and genotype calling.
新一代测序系统能够快速且经济有效地进行 DNA 测序,从而实现常规测序任务,并使我们更接近个性化医疗。然而,其读取的准确性和长度仍未超过传统的 Sanger 测序方法。这促使人们寻找计算效率高的算法,以可靠且准确地从获得的数据中检测短 DNA 片段中核苷酸的顺序。
在本文中,我们考虑了 Illumina 的测序合成平台,该平台依赖于可逆终止子化学,并通过将其数学模型重新表述为隐马尔可夫模型来描述获得的信号。基于该模型和序列蒙特卡罗方法,我们开发了一种参数估计和碱基调用方案,称为 ParticleCall。ParticleCall 在使用 Illumina 的 Genome Analyzer II 对 phiX174 噬菌体进行测序获得的数据集上进行了测试。结果表明,所开发的碱基调用方案比当前可用的性能最佳的无监督方法在计算上效率更高,同时达到相同的准确性。
提出的 ParticleCall 提供了比 Illumina 的碱基调用算法 Bustard 更准确的调用。同时,ParticleCall 比其他具有类似性能的最新方案在计算上效率更高,这使得它更适合高通量测序数据分析。碱基调用准确性的提高将对下游应用(如 SNP 和基因型调用)的性能产生直接的有益影响。