Suppr超能文献

MetaCOXI:后生动物线粒体细胞色素氧化酶亚基-I DNA序列的综合集合。

MetaCOXI: an integrated collection of metazoan mitochondrial cytochrome oxidase subunit-I DNA sequences.

作者信息

Balech Bachir, Sandioniggi Anna, Marzano Marinella, Pesole Graziano, Santamaria Monica

出版信息

Database (Oxford). 2022 Feb 3;2022. doi: 10.1093/database/baab084.

Abstract

Nucleotide sequences reference collections or databases are fundamental components in DNA barcoding and metabarcoding data analyses pipelines. In such analyses, the accurate taxonomic assignment is a crucial aspect, relying directly on the availability of comprehensive and curated reference sequence collection and its taxonomy information. The currently wide use of the mitochondrial cytochrome oxidase subunit-I (COXI) as a standard DNA barcode marker in metazoan biodiversity studies highlights the need to shed light on the availability of the related relevant information from different data sources and their eventual integration. To adequately address data integration process, many aspects should be markedly considered starting from DNA sequence curation followed by taxonomy alignment with solid reference backbone and metadata harmonization according to universal standards. Here, we present MetaCOXI, an integrated collection of curated metazoan COXI DNA sequences with their associated harmonized taxonomy and metadata. This collection was built on the two most extensive available data resources, namely the European Nucleotide Archive (ENA) and the Barcode of Life Data System (BOLD). The current release contains more than 5.6 million entries (39.1% unique to BOLD, 3.6% unique to ENA, and 57.2% shared between both), their related taxonomic classification based on NCBI reference taxonomy, and their available main metadata relevant to environmental DNA studies, such as geographical coordinates, sampling country and host species. MetaCOXI is available in standard universal formats ('fasta' for sequences & 'tsv' for taxonomy and metadata), which can be easily incorporated in standard or specific DNA barcoding and/or metabarcoding data analysis pipelines. Database URL: https://github.com/bachob5/MetaCOXI.

摘要

核苷酸序列参考文库或数据库是DNA条形码和宏条形码数据分析流程的基本组成部分。在此类分析中,准确的分类学归属是一个关键方面,直接依赖于全面且经过整理的参考序列文库及其分类学信息的可用性。目前,线粒体细胞色素氧化酶亚基I(COXI)作为后生动物生物多样性研究中的标准DNA条形码标记被广泛使用,这凸显了有必要弄清楚来自不同数据源的相关信息的可用性及其最终整合情况。为了充分解决数据整合过程,应从DNA序列整理开始,接着将分类学与可靠的参考主干进行比对,并根据通用标准协调元数据,显著考虑诸多方面。在此,我们展示了MetaCOXI,这是一个经过整理的后生动物COXI DNA序列及其相关的协调分类学和元数据的集成文库。该文库基于两个最广泛可用的数据资源构建而成,即欧洲核苷酸档案库(ENA)和生命条形码数据系统(BOLD)。当前版本包含超过560万个条目(39.1%仅在BOLD中有,3.6%仅在ENA中有,57.2%在两者之间共享),基于NCBI参考分类学的相关分类,以及与环境DNA研究相关的可用主要元数据,如地理坐标、采样国家和宿主物种。MetaCOXI以标准通用格式提供(序列为‘fasta’格式,分类学和元数据为‘tsv’格式),可轻松纳入标准或特定的DNA条形码和/或宏条形码数据分析流程。数据库网址:https://github.com/bachob5/MetaCOXI

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/186b/9216479/1ea0ac1988f1/baab084f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验