Santos Bernardo F, Miller Meredith E, Miklasevskaja Margarita, McKeown Jaclyn T A, Redmond Niamh E, Coddington Jonathan A, Bird Jessica, Miller Scott E, Smith Ashton, Brady Seán G, Buffington Matthew L, Chamorro M Lourdes, Dikow Torsten, Gates Michael W, Goldstein Paul, Konstantinov Alexander, Kula Robert, Silverson Nicholas D, Solis M Alma, deWaard Stephanie L, Naik Suresh, Nikolova Nadya, Pentinsaari Mikko, Prosser Sean W J, Sones Jayme E, Zakharov Evgeny V, deWaard Jeremy R
National Museum of Natural History, Smithsonian Institution, Washington, United States of America National Museum of Natural History, Smithsonian Institution Washington United States of America.
Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire naturelle, CNRS, SU, EPHE, UA, Paris, France Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire naturelle, CNRS, SU, EPHE, UA Paris France.
Biodivers Data J. 2023 Apr 24;11:e100904. doi: 10.3897/BDJ.11.e100904. eCollection 2023.
The use of DNA barcoding has revolutionised biodiversity science, but its application depends on the existence of comprehensive and reliable reference libraries. For many poorly known taxa, such reference sequences are missing even at higher-level taxonomic scales. We harvested the collections of the Smithsonian's National Museum of Natural History (USNM) to generate DNA barcoding sequences for genera of terrestrial arthropods previously not recorded in one or more major public sequence databases. Our workflow used a mix of Sanger and Next-Generation Sequencing (NGS) approaches to maximise sequence recovery while ensuring affordable cost. In total, COI sequences were obtained for 5,686 specimens belonging to 3,737 determined species in 3,886 genera and 205 families distributed in 137 countries. Success rates varied widely according to collection data and focal taxon. NGS helped recover sequences of specimens that failed a previous run of Sanger sequencing. Success rates and the optimal balance between Sanger and NGS are the most important drivers to maximise output and minimise cost in future projects. The corresponding sequence and taxonomic data can be accessed through the Barcode of Life Data System, GenBank, the Global Biodiversity Information Facility, the Global Genome Biodiversity Network Data Portal and the NMNH data portal.
DNA条形码技术的应用彻底改变了生物多样性科学,但其应用取决于是否存在全面且可靠的参考文库。对于许多鲜为人知的分类群而言,即使在较高分类级别上,此类参考序列也缺失。我们收集了史密森尼国家自然历史博物馆(USNM)的馆藏,以生成此前未在一个或多个主要公共序列数据库中记录的陆生节肢动物属的DNA条形码序列。我们的工作流程结合了桑格测序法和新一代测序(NGS)方法,以在确保成本可承受的同时最大限度地提高序列回收率。总共获得了5686个标本的细胞色素氧化酶亚基I(COI)序列,这些标本属于分布在137个国家的3886个属、205个科中的3737个已确定物种。成功率因收集数据和重点分类群而异。NGS有助于恢复先前桑格测序失败的标本序列。成功率以及桑格测序法和NGS之间的最佳平衡是未来项目中实现产量最大化和成本最小化的最重要驱动因素。相应的序列和分类数据可通过生命条形码数据系统、GenBank、全球生物多样性信息设施、全球基因组生物多样性网络数据门户和NMNH数据门户获取。