Frieze A M, Preparata F P, Upfal E
Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
J Comput Biol. 1999 Fall-Winter;6(3-4):361-8. doi: 10.1089/106652799318328.
An important combinatorial problem, motivated by DNA sequencing in molecular biology, is the reconstruction of a sequence over a small finite alphabet from the collection of its probes (the sequence spectrum), obtained by sliding a fixed sampling pattern over the sequence. Such construction is required for Sequencing-by-Hybridization (SBH), a novel DNA sequencing technique based on an array (SBH chip) of short nucleotide sequences (probes). Once the sequence spectrum is biochemically obtained, a combinatorial method is used to reconstruct the DNA sequence from its spectrum. Since technology limits the number of probes on the SBH chip, a challenging combinatorial question is the design of a smallest set of probes that can sequence an arbitrary DNA string of a given length. We present in this work a novel probe design, crucially based on the use of universal bases [bases that bind to any nucleotide (Loakes and Brown, 1994)] that drastically improves the performance of the SBH process and asymptotically approaches the information-theoretic bound up to a constant factor. Furthermore, the sequencing algorithm we propose is substantially simpler than the Eulerian path method used in previous solutions of this problem.
分子生物学中受DNA测序驱动的一个重要组合问题,是根据通过在序列上滑动固定采样模式获得的探针集合(序列谱),重建一个由小的有限字母表组成的序列。这种构建是基于杂交测序(SBH)所需要的,SBH是一种基于短核苷酸序列(探针)阵列(SBH芯片)的新型DNA测序技术。一旦通过生化方法获得了序列谱,就会使用一种组合方法从其谱中重建DNA序列。由于技术限制了SBH芯片上探针的数量,一个具有挑战性的组合问题是设计一组最小的探针,使其能够对给定长度的任意DNA字符串进行测序。我们在这项工作中提出了一种新颖的探针设计,关键在于使用通用碱基[与任何核苷酸结合的碱基(Loakes和Brown,1994)],这极大地提高了SBH过程的性能,并在渐近意义上达到信息论界限的一个常数因子。此外,我们提出的测序算法比此前解决该问题的方法中使用的欧拉路径法要简单得多。