Fedrigo Olivier, Naylor Gavin
Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50011, USA.
Nucleic Acids Res. 2004 Feb 18;32(3):1208-13. doi: 10.1093/nar/gkh210. Print 2004.
Sequencing by hybridization (SBH) approaches to DNA sequencing face two conflicting constraints. First, in order to ensure that the target DNA binds reliably, the oligonucleotide probes that are attached to the chip array must be >15 bp in length. Secondly, the total number of possible 15 bp oligonucleotides is too large (>4(15)) to fit on a chip with current technology. To circumvent the conflict between these two opposing constraints, we present a novel gene-specific DNA chip design. Our design is based on the idea that not all conceivable oligonucleotides need to be placed on a chip--only those that capture sequence combinations occurring in nature. Our approach uses a training set of aligned sequences that code for the gene in question. We compute the minimum number of oligonucleotides (generally 15-30 bp in length) that need to be placed on a DNA chip to capture the variation implied by the training set using a graph search algorithm. We tested the approach in silico using cytochrome-b sequences. Results indicate that on average, 98% of the sequence of an unknown target can be determined using the approach.
基于杂交的DNA测序方法面临两个相互冲突的限制。首先,为确保目标DNA可靠结合,附着在芯片阵列上的寡核苷酸探针长度必须大于15个碱基对。其次,15个碱基对的寡核苷酸的总数太大(>4^15),无法用当前技术集成在芯片上。为了规避这两个相反限制之间的冲突,我们提出了一种新颖的基因特异性DNA芯片设计。我们的设计基于这样一个理念:并非所有可能的寡核苷酸都需要放置在芯片上,只需那些能够捕获自然界中出现的序列组合的寡核苷酸。我们的方法使用一组比对好的、编码目标基因的序列作为训练集。我们使用图搜索算法计算出需要放置在DNA芯片上的寡核苷酸的最小数量(通常长度为15 - 30个碱基对),以捕获训练集中所隐含的变异。我们使用细胞色素b序列在计算机上对该方法进行了测试。结果表明,平均而言,使用该方法可以确定未知目标序列的98%。