Center for Ancient Genetics, Institute of Biology, University of Copenhagen, Copenhagen, Denmark.
PLoS One. 2007 Feb 14;2(2):e197. doi: 10.1371/journal.pone.0000197.
The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources.
We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis.
We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
基因组序列 20 DNA 测序系统(454 平行测序平台)的发明使得快速、大容量的序列数据生产成为可能。然而,到目前为止,单个乳液 PCR(emPCR)反应和随后的测序运行还无法组合来自多个个体的模板 DNA,因为同源序列无法随后被分配到其原始来源。
我们使用带有 5'-核苷酸标记引物的常规 PCR 从多个标本中生成同源 DNA 扩增产物,然后通过高通量基因组序列 20 DNA 测序系统(GS20,罗氏/454 生命科学)进行测序。每个 DNA 序列随后通过 5'标签分析追溯到其个体来源。
我们证明,一旦考虑到测序异常(错误分配率<0.4%),这种新方法几乎可以将所有生成的 DNA 序列分配给正确的来源。因此,该方法可以在单个高通量 GS20 运行中对来自多个来源的同源 DNA 序列进行准确测序和分配。我们观察到不同标记引物的分布存在偏倚,这种偏倚依赖于标签的 5'核苷酸。特别是,5'标记为胞嘧啶的引物在最终序列中高度过量,而 5'标记为胸腺嘧啶的引物则严重不足。二核苷酸标签的第二个核苷酸排序的序列分布也存在较弱的偏倚。由于结果基于单个 GS20 运行,因此需要确认该方法的普遍适用性。然而,我们的实验表明,5'引物标记是一种有用的方法,其中 GS20 的测序能力可应用于基于 PCR 的多个同源 PCR 产物的分析。该新方法将对广泛的研究领域具有价值,例如比较基因组学、完整线粒体分析、群体遗传学和系统发生学。