Drmanac R, Labat I, Brukner I, Crkvenjakov R
Genetic Engineering Center, Belgrade, Yugoslavia.
Genomics. 1989 Feb;4(2):114-28. doi: 10.1016/0888-7543(89)90290-5.
A mismatch-free hybridization of oligonucleotides containing from 11 to 20 monomers to unknown DNA represents, in essence, a sequencing of a complementary target. Realizing this, we have used probability calculations and, in part, computer simulations to estimate the types and numbers of oligonucleotides that would have to be synthesized in order to sequence a megabase plus segment of DNA. We estimate that 95,000 specific mixes of 11-mers, mainly of the 5'(A,T,C,G)(A,T,C,G)N8(A,T,C,G)3' type, hybridized consecutively to dot blots of cloned genomic DNA fragments would provide primary data for the sequence assembly. An optimal mixture of representative libraries in M13 vector, having inserts of (i) 7 kb, (ii) 0.5 kb genomic fragments randomly ligated in up to 10-kb inserts, and (iii) tandem "jumping" fragments 100 kb apart in the genome, will be needed. To sequence each million base pairs of DNA, one would need hybridization data from about 2100 separate hybridization sample dots. Inevitable gaps and uncertainties in alignment of sequenced fragments arising from nonrandom or repetitive sequence organization of complex genomes and difficulties in cloning "poisonous" sequences in Escherichia coli, inherent to large sequencing by any method, have been considered and minimized by choice of libraries and number of subclones used for hybridization. Because it is based on simpler biochemical procedures, our method is inherently easier to automate than existing sequencing methods. The sequence can be derived from simple primary data only by extensive computing. Phased experimental tests and computer simulations increasing in complexity are needed before accurate estimates can be made in terms of cost and speed of sequencing by the new approach. Nevertheless, sequencing by hybridization should show advantages over existing methods because of the inherent redundancy and parallelism in its data gathering.
含11至20个单体的寡核苷酸与未知DNA进行无错配杂交,本质上代表了对互补靶标的测序。认识到这一点后,我们利用概率计算并部分借助计算机模拟,来估算为对百万碱基以上的DNA片段进行测序而必须合成的寡核苷酸的类型和数量。我们估计,将95,000种主要为5'(A,T,C,G)(A,T,C,G)N8(A,T,C,G)3'类型的11聚体特异性混合物依次与克隆的基因组DNA片段的斑点印迹杂交,将为序列组装提供原始数据。需要一个最佳的M13载体代表性文库混合物,其插入片段包括:(i) 7 kb,(ii) 随机连接至最大10 kb插入片段的0.5 kb基因组片段,以及(iii) 基因组中相隔100 kb的串联“跳跃”片段。为了对每百万碱基对的DNA进行测序,大约需要来自2100个单独杂交样品点的杂交数据。复杂基因组的非随机或重复序列组织导致测序片段比对中不可避免的缺口和不确定性,以及在大肠杆菌中克隆“有毒”序列的困难,这些在任何大规模测序方法中都是固有的,我们已通过选择文库和用于杂交的亚克隆数量来考虑并尽量减少这些问题。由于我们的方法基于更简单的生化程序,它本质上比现有测序方法更容易自动化。该序列只能通过大量计算从简单的原始数据中推导出来。在能够就新方法测序的成本和速度做出准确估计之前,需要进行逐步的、复杂度不断增加的实验测试和计算机模拟。然而,杂交测序应该会比现有方法显示出优势,因为其数据收集具有固有的冗余性和平行性。