Department of Integrative Biology & Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, N1G 2W1, Canada.
BMC Ecol. 2011 Aug 1;11:18. doi: 10.1186/1472-6785-11-18.
When a specimen belongs to a species not yet represented in DNA barcode reference libraries there is disagreement over the effectiveness of using sequence comparisons to assign the query accurately to a higher taxon. Library completeness and the assignment criteria used have been proposed as critical factors affecting the accuracy of such assignments but have not been thoroughly investigated. We explored the accuracy of assignments to genus, tribe and subfamily in the Sphingidae, using the almost complete global DNA barcode reference library (1095 species) available for this family. Costa Rican sphingids (118 species), a well-documented, diverse subset of the family, with each of the tribes and subfamilies represented were used as queries. We simulated libraries with different levels of completeness (10-100% of the available species), and recorded assignments (positive or ambiguous) and their accuracy (true or false) under six criteria.
A liberal tree-based criterion assigned 83% of queries accurately to genus, 74% to tribe and 90% to subfamily, compared to a strict tree-based criterion, which assigned 75% of queries accurately to genus, 66% to tribe and 84% to subfamily, with a library containing 100% of available species (but excluding the species of the query). The greater number of true positives delivered by more relaxed criteria was negatively balanced by the occurrence of more false positives. This effect was most sharply observed with libraries of the lowest completeness where, for example at the genus level, 32% of assignments were false positives with the liberal criterion versus < 1% when using the strict. We observed little difference (< 8% using the liberal criterion) however, in the overall accuracy of the assignments between the lowest and highest levels of library completeness at the tribe and subfamily level.
Our results suggest that when using a strict tree-based criterion for higher taxon assignment with DNA barcodes, the likelihood of assigning a query a genus name incorrectly is very low, if a genus name is provided it has a high likelihood of being accurate, and if no genus match is available the query can nevertheless be assigned to a subfamily with high accuracy regardless of library completeness. DNA barcoding often correctly assigned sphingid moths to higher taxa when species matches were unavailable, suggesting that barcode reference libraries can be useful for higher taxon assignments long before they achieve complete species coverage.
当一个样本属于 DNA 条码参考库中尚未出现的物种时,使用序列比较将查询准确地分配到更高的分类群会存在争议。库的完整性和使用的分配标准被认为是影响此类分配准确性的关键因素,但尚未得到彻底研究。我们使用几乎完整的全球 DNA 条码参考库(1095 种)来探索天蚕科属、族和亚科的分配准确性,该参考库可用于该科。作为查询,我们使用哥斯达黎加天蚕(118 种),这是该科一个记录完善、多样化的子集,代表了每个族和亚科。我们模拟了不同完整性水平的库(可用物种的 10-100%),并记录了在六种标准下的分配(阳性或不确定)及其准确性(真或假)。
与严格的基于树的标准相比,宽松的基于树的标准将 83%的查询准确地分配到属,74%分配到族,90%分配到亚科,而严格的基于树的标准将 75%的查询准确地分配到属,66%分配到族,84%分配到亚科,库中包含 100%的可用物种(但不包括查询物种)。更宽松的标准提供了更多的真阳性,但也带来了更多的假阳性,这一效果在最低完整性的库中最为明显,例如在属水平上,宽松标准下有 32%的分配是假阳性,而严格标准下则不到 1%。然而,在族和亚科水平上,最低和最高库完整性水平之间的分配准确性几乎没有差异(使用宽松标准时为<8%)。
我们的结果表明,使用基于树的严格标准进行 DNA 条码的高级分类群分配时,如果提供了属名,那么将查询分配到一个错误的属名的可能性非常低,该属名很可能是准确的,如果没有匹配的属名,无论库的完整性如何,查询仍然可以被准确地分配到一个亚科。DNA 条码通常可以正确地将天蚕蛾分配到更高的分类群,即使没有物种匹配,这表明在条码参考库完全涵盖所有物种之前,它们可以用于高级分类群的分配。