Suppr超能文献

利用来自哥伦比亚的超过一千个昆虫 DNA 条形码对 BOLD 和 GenBank 数据库进行分类鉴定准确性研究。

Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia.

机构信息

Instituto de Investigación de Recursos Biológicos Alexander von Humboldt, Bogotá, Colombia.

ICA-Instituto Colombiano Agropecuario, Soledad, Atlántico, Colombia.

出版信息

PLoS One. 2023 Apr 24;18(4):e0277379. doi: 10.1371/journal.pone.0277379. eCollection 2023.

Abstract

Recent declines of insect populations at high rates have resulted in the need to develop a quick method to determine their diversity and to process massive data for the identification of species of highly diverse groups. A short sequence of DNA from COI is widely used for insect identification by comparing it against sequences of known species. Repositories of sequences are available online with tools that facilitate matching of the sequences of interest to a known individual. However, the performance of these tools can differ. Here we aim to assess the accuracy in identification of insect taxonomic categories from two repositories, BOLD Systems and GenBank. This was done by comparing the sequence matches between the taxonomist identification and the suggested identification from the platforms. We used 1,160 COI sequences representing eight orders of insects from Colombia. After the comparison, we reanalyzed the results from a representative subset of the data from the subfamily Scarabaeinae (Coleoptera). Overall, BOLD systems outperformed GenBank, and the performance of both engines differed by orders and other taxonomic categories (species, genus and family). Higher rates of accurate identification were obtained at family and genus levels. The accuracy was higher in BOLD for the order Coleoptera at family level, for Coleoptera and Lepidoptera at genus and species level. Other orders performed similarly in both repositories. Moreover, the Scarabaeinae subset showed that species were correctly identified only when BOLD match percentage was above 93.4% and a total of 85% of the samples were correctly assigned to a taxonomic category. These results accentuate the great potential of the identification engines to place insects accurately into their respective taxonomic categories based on DNA barcodes and highlight the reliability of BOLD Systems for insect identification in the absence of a large reference database for a highly diverse country.

摘要

近年来,昆虫种群数量迅速下降,因此需要开发一种快速方法来确定它们的多样性,并处理大量数据以识别高度多样化群体的物种。COI 的一小段 DNA 序列被广泛用于通过将其与已知物种的序列进行比较来鉴定昆虫。序列存储库可在线获取,其中的工具可方便地将感兴趣的序列与已知个体进行匹配。然而,这些工具的性能可能会有所不同。在这里,我们旨在评估两个存储库,BOLD Systems 和 GenBank 中昆虫分类类别的识别准确性。这是通过比较分类学家的鉴定和平台建议的鉴定之间的序列匹配来完成的。我们使用了来自哥伦比亚的代表 8 个昆虫目的 1160 个 COI 序列。比较后,我们重新分析了来自 Scarabaeinae 亚科(鞘翅目)数据的代表性子集的结果。总体而言,BOLD 系统的表现优于 GenBank,并且这两个引擎的性能因订单和其他分类类别(物种、属和科)而异。在科和属级别获得了更高的准确鉴定率。在科级别,BOLD 对鞘翅目订单的鉴定准确率较高,在属和种级别,BOLD 对鞘翅目和鳞翅目也是如此。其他订单在两个存储库中的表现相似。此外,Scarabaeinae 子集表明,只有当 BOLD 匹配百分比高于 93.4%且总共 85%的样本被正确分配到一个分类类别时,物种才能被正确识别。这些结果突出了鉴定引擎在基于 DNA 条码将昆虫准确地归入其各自分类类别的巨大潜力,并强调了在缺乏高度多样化国家的大型参考数据库的情况下,BOLD Systems 用于昆虫鉴定的可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0923/10124890/a2575b7c6d33/pone.0277379.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验