Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore 117543, Singapore.
Syst Biol. 2020 Sep 1;69(5):999-1015. doi: 10.1093/sysbio/syaa014.
New techniques for the species-level sorting of millions of specimens are needed in order to accelerate species discovery, determine how many species live on earth, and develop efficient biomonitoring techniques. These sorting methods should be reliable, scalable, and cost-effective, as well as being largely insensitive to low-quality genomic DNA, given that this is usually all that can be obtained from museum specimens. Mini-barcodes seem to satisfy these criteria, but it is unclear how well they perform for species-level sorting when compared with full-length barcodes. This is here tested based on 20 empirical data sets covering ca. 30,000 specimens (5500 species) and six clade-specific data sets from GenBank covering ca. 98,000 specimens ($>$20,000 species). All specimens in these data sets had full-length barcodes and had been sorted to species-level based on morphology. Mini-barcodes of different lengths and positions were obtained in silico from full-length barcodes using a sliding window approach (three windows: 100 bp, 200 bp, and 300 bp) and by excising nine mini-barcodes with established primers (length: 94-407 bp). We then tested whether barcode length and/or position reduces species-level congruence between morphospecies and molecular operational taxonomic units (mOTUs) that were obtained using three different species delimitation techniques (Poisson Tree Process, Automatic Barcode Gap Discovery, and Objective Clustering). Surprisingly, we find no significant differences in performance for both species- or specimen-level identification between full-length and mini-barcodes as long as they are of moderate length ($>$200 bp). Only very short mini-barcodes (<200 bp) perform poorly, especially when they are located near the 5$^\prime$ end of the Folmer region. The mean congruence between morphospecies and mOTUs was ca. 75% for barcodes $>$200 bp and the congruent mOTUs contain ca. 75% of all specimens. Most conflict is caused by ca. 10% of the specimens that can be identified and should be targeted for re-examination in order to efficiently resolve conflict. Our study suggests that large-scale species discovery, identification, and metabarcoding can utilize mini-barcodes without any demonstrable loss of information compared to full-length barcodes. [DNA barcoding; metabarcoding; mini-barcodes; species discovery.].
为了加速物种发现、确定地球上有多少物种以及开发高效的生物监测技术,需要新的技术来对数百万标本进行种级分类。这些分类方法应该可靠、可扩展且具有成本效益,并且对低质量的基因组 DNA 具有较大的不敏感性,因为这通常是从博物馆标本中获得的全部 DNA。微型条形码似乎符合这些标准,但与全长条形码相比,它们在物种级分类中的表现如何尚不清楚。在这里,基于涵盖约 30,000 个标本(5500 个物种)的 20 个经验数据集和涵盖约 98,000 个标本(超过 20,000 个物种)的来自 GenBank 的六个分类群特异性数据集,对此进行了测试。这些数据集的所有标本均具有全长条形码,并且根据形态学已分类到种级。通过滑动窗口方法(三个窗口:100 bp、200 bp 和 300 bp)和从全长条形码中提取具有既定引物的 9 个微型条形码(长度:94-407 bp),在计算机上获得了不同长度和位置的微型条形码。然后,我们测试了条形码的长度和/或位置是否会降低使用三种不同物种界定技术(泊松树过程、自动条形码间隙发现和目标聚类)获得的形态种和分子操作分类单元(mOTU)之间的种级一致性。令人惊讶的是,只要条形码具有中等长度(>200 bp),我们在全长和微型条形码的物种或标本级别的鉴定性能上都没有发现显著差异。只有非常短的微型条形码(<200 bp)表现不佳,尤其是当它们位于 Folmer 区域的 5'端附近时。条形码长度>200 bp 的形态种和 mOTU 之间的平均一致性约为 75%,并且一致的 mOTU 包含约 75%的所有标本。大多数冲突是由约 10%的可识别标本引起的,这些标本应作为重点重新检查,以有效地解决冲突。我们的研究表明,与全长条形码相比,大规模的物种发现、鉴定和 metabarcoding 可以利用微型条形码,而不会有任何信息丢失。[DNA 条形码; metabarcoding;微型条形码;物种发现。]