Flynn Sarah M C, Carr Steven M
Genetics, Evolution, & Molecular Systematics Laboratory, Department of Biology, Memorial University of Newfoundland, St, John's, NL A1B3X9, Canada.
BMC Genomics. 2007 Sep 25;8:339. doi: 10.1186/1471-2164-8-339.
Iterative DNA "resequencing" on oligonucleotide microarrays offers a high-throughput method to measure intraspecific iodiversity, one that is especially suited to SNP-dense gene regions such as vertebrate mitochondrial (mtDNA) genomes. However, costs of single-species design and microarray fabrication are prohibitive. A cost-effective, multi-species strategy is to hybridize experimental DNAs from diverse species to a common microarray that is tiled with oligonucleotide sets from multiple, homologous reference genomes. Such a strategy requires that cross-hybridization between the experimental DNAs and reference oligos from the different species not interfere with the accurate recovery of species-specific data. To determine the pattern and limits of such interspecific hybridization, we compared the efficiency of sequence recovery and accuracy of SNP identification by a 15,452-base human-specific microarray challenged with human, chimpanzee, gorilla, and codfish mtDNA genomes.
In the human genome, 99.67% of the sequence was recovered with 100.0% accuracy. Accuracy of SNP identification declines log-linearly with sequence divergence from the reference, from 0.067 to 0.247 errors per SNP in the chimpanzee and gorilla genomes, respectively. Efficiency of sequence recovery declines with the increase of the number of interspecific SNPs in the 25b interval tiled by the reference oligonucleotides. In the gorilla genome, which differs from the human reference by 10%, and in which 46% of these 25b regions contain 3 or more SNP differences from the reference, only 88% of the sequence is recoverable. In the codfish genome, which differs from the reference by > 30%, less than 4% of the sequence is recoverable, in short islands > or = 12b that are conserved between primates and fish.
Experimental DNAs bind inefficiently to homologous reference oligonucleotide sets on a re-sequencing microarray when their sequences differ by more than a few percent. The data suggest that interspecific cross-hybridization will not interfere with the accurate recovery of species-specific data from multispecies microarrays, provided that the species' DNA sequences differ by > 20% (mean of 5b differences per 25b oligo). Recovery of DNA sequence data from multiple, distantly-related species on a single multiplex gene chip should be a practical, highly-parallel method for investigating genomic biodiversity.
在寡核苷酸微阵列上进行迭代DNA“重测序”提供了一种高通量方法来测量种内遗传多样性,这种方法特别适用于单核苷酸多态性(SNP)密集的基因区域,如脊椎动物线粒体(mtDNA)基因组。然而,单物种设计和微阵列制造的成本过高。一种经济有效的多物种策略是将来自不同物种的实验DNA与一个通用微阵列杂交,该微阵列用来自多个同源参考基因组的寡核苷酸集平铺。这种策略要求不同物种的实验DNA与参考寡核苷酸之间的交叉杂交不会干扰物种特异性数据的准确恢复。为了确定这种种间杂交的模式和限制,我们比较了用人类、黑猩猩、大猩猩和鳕鱼mtDNA基因组挑战的15452碱基人类特异性微阵列的序列恢复效率和SNP识别准确性。
在人类基因组中,99.67%的序列被恢复,准确性为100.0%。SNP识别的准确性随着与参考序列的序列差异呈对数线性下降,在黑猩猩和大猩猩基因组中,每个SNP的错误分别从0.067到0.247。序列恢复效率随着参考寡核苷酸平铺的25b区间内种间SNP数量的增加而下降。在与人类参考序列相差10%的大猩猩基因组中,这些25b区域中有46%与参考序列存在3个或更多SNP差异,只有88%的序列可恢复。在与参考序列相差>30%的鳕鱼基因组中,只有不到4%的序列可恢复,存在于灵长类和鱼类之间保守的≥12b的短片段中。
当实验DNA的序列差异超过几个百分点时,它们与重测序微阵列上的同源参考寡核苷酸集的结合效率低下。数据表明,种间交叉杂交不会干扰从多物种微阵列中准确恢复物种特异性数据,前提是物种的DNA序列差异>20%(每25b寡核苷酸平均5b差异)。在单个多重基因芯片上从多个远缘物种中恢复DNA序列数据应该是一种实用的、高度并行的研究基因组生物多样性的方法。