Cabezas M Pilar, Fonseca Nuno A, Muñoz-Mérida Antonio
Centre of Molecular and Environmental Biology (CBMA), Department of Biology, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal.
Institute of Science and Innovation for Bio-Sustainability (IB-S), University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal.
Environ Microbiome. 2024 Nov 9;19(1):88. doi: 10.1186/s40793-024-00634-w.
Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem.
The current study presents MIMt, a new 16S rRNA database for archaea and bacteria's identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.
准确测定和量化微生物群落的分类组成,尤其是在物种水平上,是宏基因组学的主要问题之一。这主要是由于常用的16S rRNA参考数据库存在局限性,这些数据库要么包含大量冗余信息,要么包含高比例的分类信息缺失的序列。这可能导致错误的鉴定,从而得出关于这些微生物在生态系统中的生态作用和重要性的不准确结论。
当前研究提出了MIMt,这是一个用于古菌和细菌鉴定的新16S rRNA数据库,包含47001条序列,所有序列均在物种水平上得到精确鉴定。此外,还创建了MIMt2.0版本,仅包含来自RefSeq Targeted loci的经过整理的序列,共32086条。MIMt旨在每年更新两次,以纳入所有新测序的物种。我们在序列分布和分类分配准确性方面将MIMt与Greengenes、RDP、GTDB和SILVA进行了评估。结果表明,MIMt的冗余度较低,尽管比现有数据库小20到500倍,但在完整性和分类准确性方面优于它们,能够在较低分类级别进行更精确的分配,从而显著提高物种水平的鉴定。