Halperin Eran, Halperin Shay, Hartman Tzvika, Shamir Ron
CS Division, Soda Hall, University of California at Berkeley, Berkeley, CA 94720-1776, USA.
J Comput Biol. 2003;10(3-4):483-97. doi: 10.1089/10665270360688138.
Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some fixed k, typically 8 to 10. Currently, in spite of its promise and elegance, SBH is not competitive with standard gel-based sequencing methods. This is due to two main reasons: lack of tools to handle realistic levels of hybridization errors and an inherent limitation on the length of uniquely reconstructible sequence by standard universal arrays. In this paper, we deal with both problems. We introduce a simple polynomial reconstruction algorithm which can be applied to spectra from standard arrays and has provable performance in the presence of both false negative and false positive errors. We also propose a novel design of chips containing universal bases that differs from the one proposed by Preparata et al. (1999). We give a simple algorithm that uses spectra from such chips to reconstruct with high probability random sequences of length lower only by a squared log factor compared to the information theoretic bound. Our algorithm is very robust to errors and has a provable performance even if there are both false negative and false positive errors. Simulations indicate that its sensitivity to errors is also very small in practice.
杂交测序(SBH)是一种DNA测序技术,其中序列是利用其k元组含量进行重建的。这种含量,即所谓的序列谱,是通过与通用DNA阵列杂交获得的。标准通用阵列包含针对某个固定k(通常为8到10)的所有k元组。目前,尽管SBH有前景且精妙,但它与基于凝胶的标准测序方法相比并无竞争力。这主要有两个原因:缺乏处理实际杂交错误水平的工具,以及标准通用阵列在唯一可重建序列长度方面存在固有局限性。在本文中,我们处理这两个问题。我们引入一种简单的多项式重建算法,该算法可应用于来自标准阵列的谱,并且在存在假阴性和假阳性错误的情况下具有可证明的性能。我们还提出了一种包含通用碱基的芯片的新颖设计,它不同于Preparata等人(1999年)提出的设计。我们给出一种简单算法,该算法使用来自此类芯片的谱以高概率重建长度仅比信息论界限低一个平方对数因子的随机序列。我们的算法对错误非常鲁棒,即使存在假阴性和假阳性错误也具有可证明的性能。模拟表明,在实际应用中其对错误的敏感度也非常小。