Song Hojun, Buhay Jennifer E, Whiting Michael F, Crandall Keith A
Department of Biology, Brigham Young University, Provo, UT 84602, USA.
Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13486-91. doi: 10.1073/pnas.0803076105. Epub 2008 Aug 29.
Nuclear mitochondrial pseudogenes (numts) are nonfunctional copies of mtDNA in the nucleus that have been found in major clades of eukaryotic organisms. They can be easily coamplified with orthologous mtDNA by using conserved universal primers; however, this is especially problematic for DNA barcoding, which attempts to characterize all living organisms by using a short fragment of the mitochondrial cytochrome c oxidase I (COI) gene. Here, we study the effect of numts on DNA barcoding based on phylogenetic and barcoding analyses of numt and mtDNA sequences in two divergent lineages of arthropods: grasshoppers and crayfish. Single individuals from both organisms have numts of the COI gene, many of which are highly divergent from orthologous mtDNA sequences, and DNA barcoding analysis incorrectly overestimates the number of unique species based on the standard metric of 3% sequence divergence. Removal of numts based on a careful examination of sequence characteristics, including indels, in-frame stop codons, and nucleotide composition, drastically reduces the incorrect inferences of the number of unique species, but even such rigorous quality control measures fail to identify certain numts. We also show that the distribution of numts is lineage-specific and the presence of numts cannot be known a priori. Whereas DNA barcoding strives for rapid and inexpensive generation of molecular species tags, we demonstrate that the presence of COI numts makes this goal difficult to achieve when numts are prevalent and can introduce serious ambiguity into DNA barcoding.
核线粒体假基因(numts)是细胞核中mtDNA的无功能拷贝,已在真核生物的主要进化枝中被发现。使用保守的通用引物,它们很容易与直系同源mtDNA共同扩增;然而,这对于DNA条形码技术来说尤其成问题,因为DNA条形码技术试图通过线粒体细胞色素c氧化酶I(COI)基因的短片段来表征所有生物。在这里,我们基于对节肢动物两个不同谱系(蝗虫和小龙虾)的numt和mtDNA序列进行系统发育和条形码分析,研究numts对DNA条形码技术的影响。这两种生物的单个个体都有COI基因的numts,其中许多与直系同源mtDNA序列高度不同,并且基于3%序列差异的标准指标,DNA条形码分析错误地高估了独特物种的数量。基于对序列特征(包括插入缺失、框内终止密码子和核苷酸组成)的仔细检查去除numts,可大幅减少对独特物种数量的错误推断,但即使是这样严格的质量控制措施也无法识别某些numts。我们还表明,numts的分布具有谱系特异性,并且numts的存在不能预先得知。虽然DNA条形码技术致力于快速且廉价地生成分子物种标签,但我们证明,当numts普遍存在时,COI numts的存在使得这一目标难以实现,并且会给DNA条形码技术带来严重的歧义。