Preparata Franco P
Computer Science Department, Brown University, 115 Waterman Street, Providence, RI 02912-1910, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):46-52. doi: 10.1109/TCBB.2004.12.
All published approaches to DNA sequencing by hybridization (SBH) consist of the biochemical acquisition of the spectrum of a target sequence (the set of its subsequences conforming to a given probing pattern) followed by the algorithmic reconstruction of the sequence from its spectrum. In the "standard" or "uniform" approach, the probing pattern is a string of length L and the length of reliably reconstructible sequences is known to be mlen = O(2(L)). For a fixed microarray area, higher sequencing performance can be achieved by inserting nonprobing gaps ("wild-cards") in the probing pattern. The reconstruction, however, must cope with the emergence of fooling probes due to the gaps and algorithmic failure occurs when the spectrum becomes too densely populated, although we can achieve mcomp = 0(4(L)). Despite the combinatorial success of gapped probing, all current approaches are based on a biochemically unrealistic spectrum-acquisition model (digital-spectrum). The reality of hybridization is much more complex. Departing from the conventional model, in this paper, we propose an alternative, called the analog-spectrum model, which more closely reflects the biochemical process. This novel modeling reestablishes probe length as the performance-governing factor, adopting "semidegenerate bases" as suitable emulators of currently inadequate universal bases. One important conclusion is that accurate biochemical measurements are pivotal to the success of SBH. The theoretical proposal presented in this paper should be a convincing stimulus for the needed biotechnological work.
所有已发表的基于杂交的DNA测序方法(SBH)都包括对目标序列谱(符合给定探测模式的子序列集)进行生化获取,然后从其谱中通过算法重建序列。在“标准”或“统一”方法中,探测模式是长度为L的字符串,已知可靠可重建序列的长度为mlen = O(2(L))。对于固定的微阵列面积,通过在探测模式中插入非探测间隙(“通配符”)可以实现更高的测序性能。然而,由于间隙的存在,重建必须应对欺骗性探针的出现,并且当谱变得过于密集时会发生算法失败,尽管我们可以实现mcomp = 0(4(L))。尽管有间隙探测在组合方面取得了成功,但所有当前方法都基于一种生化上不现实的谱获取模型(数字谱)。杂交的实际情况要复杂得多。本文背离传统模型,提出了一种替代方案,称为模拟谱模型,它更紧密地反映了生化过程。这种新颖的建模将探针长度重新确立为性能控制因素,采用“半简并碱基”作为当前不足的通用碱基的合适模拟物。一个重要结论是,准确的生化测量对于SBH的成功至关重要。本文提出的理论建议应该会对所需的生物技术工作起到令人信服的推动作用。