Vázquez-González Lara, Regueira-Iglesias Alba, Balsa-Castro Carlos, Tomás Inmaculada, Carreira María J
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Rúa de Jenaro de la Fuente Domínguez, E15782, Santiago de Compostela, Spain.
Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS), E15706, Santiago de Compostela, Spain.
Sci Data. 2025 May 2;12(1):729. doi: 10.1038/s41597-025-05050-4.
In a given species, genomes and 16S rRNA gene sequences, along with their intragenomic copy numbers, can vary greatly across environments. The gene copy numbers are crucial for technologies which estimate microbial abundances based on gene counts, such as polymerase chain reaction and high-throughput sequencing. In these, taxa with fewer genes may be underestimated, while those with more genes might be overestimated. Therefore, it is essential to have accurate gene copy number databases specific to the niche under study. The 16S rRNA Gene Oral Sequences dataset (16SGOSeq) contains the number of 16S rRNA genes and their variants in the complete genomes of the bacterial and archaeal species present in the human oral cavity. It includes 3,192 complete genomes of oral bacteria and 191 complete genomes of oral archaea, from which the 16S rRNA gene sequences were extracted, and the sequence variants were identified. This oral-specific dataset of prokaryotic organisms and the pipeline followed for its construction can be applied by clinical microbiologists, bioinformaticians, or microbial ecologists in future microbiome research.
在特定物种中,基因组和16S rRNA基因序列及其基因组内的拷贝数,会因环境的不同而有很大差异。基因拷贝数对于基于基因计数来估算微生物丰度的技术至关重要,比如聚合酶链反应和高通量测序。在这些技术中,基因数量较少的分类群可能会被低估,而基因数量较多的分类群可能会被高估。因此,拥有针对所研究生态位的准确基因拷贝数数据库至关重要。16S rRNA基因口腔序列数据集(16SGOSeq)包含了人类口腔中存在的细菌和古菌物种完整基因组中16S rRNA基因及其变体的数量。它包括3192个口腔细菌的完整基因组和191个口腔古菌的完整基因组,从中提取了16S rRNA基因序列,并鉴定了序列变体。这个原核生物的口腔特异性数据集及其构建流程,可供临床微生物学家、生物信息学家或微生物生态学家在未来的微生物组研究中使用。