Gwak Ho-Jin, Rho Mina
Department of Computer Science and Engineering, Hanyang University, Seoul, South Korea.
Department of Biomedical Informatics, Hanyang University, Seoul, South Korea.
Front Microbiol. 2020 Nov 12;11:570825. doi: 10.3389/fmicb.2020.570825. eCollection 2020.
With the emergence of next-generation sequencing (NGS) technology, there have been a large number of metagenomic studies that estimated the bacterial composition via 16S ribosomal RNA (16S rRNA) amplicon sequencing. In particular, subsets of the hypervariable regions in 16S rRNA, such as V1-V2 and V3-V4, are targeted using high-throughput sequencing. The sequences from different taxa are assigned to a specific taxon based on the sequence homology. Since such sequences are highly homologous or identical between species in the same genus, it is challenging to determine the exact species using 16S rRNA sequences only. Therefore, in this study, were defined to obtain maximum resolution related with species using 16S rRNA. For the taxonomic assignment using 16S rRNA, three major 16S rRNA databases are independently used since the lineage of certain bacteria is not consistent among these databases. On the basis of the NCBI taxonomy classification, we re-annotated inconsistent lineage information in three major 16S rRNA databases. For each species, we constructed a consensus sequence model for each hypervariable region and determined that consist of indistinguishable species in terms of sequence homology. Using a -nearest neighbor method and the species consensus sequence models, the species-level taxonomy was determined. If the species determined is a member of , the species group is assigned instead of a specific species. Notably, the results of the evaluation on our method using simulated and mock datasets showed a high correlation with the real bacterial composition. Furthermore, in the analysis of real microbiome samples, such as salivary and gut microbiome samples, our method successfully performed species-level profiling and identified differences in the bacterial composition between different phenotypic groups.
随着下一代测序(NGS)技术的出现,已经有大量的宏基因组学研究通过16S核糖体RNA(16S rRNA)扩增子测序来估计细菌组成。特别是,16S rRNA中高变区的子集,如V1-V2和V3-V4,被用于高通量测序。基于序列同源性,将来自不同分类群的序列分配到特定的分类单元。由于同一属内不同物种之间的此类序列高度同源或相同,仅使用16S rRNA序列来确定确切物种具有挑战性。因此,在本研究中,定义了 以使用16S rRNA获得与物种相关的最大分辨率。对于使用16S rRNA的分类学分配,由于某些细菌的谱系在这三个主要的16S rRNA数据库中不一致,因此独立使用了三个主要的16S rRNA数据库。基于NCBI分类法分类,我们重新注释了三个主要16S rRNA数据库中不一致的谱系信息。对于每个物种,我们为每个高变区构建了一个共有序列模型,并确定了在序列同源性方面无法区分的物种组成的 。使用 -最近邻方法和物种共有序列模型,确定了物种水平的分类学。如果确定的物种是 的成员,则分配物种组而不是特定物种。值得注意的是,使用模拟和mock数据集对我们的方法进行评估的结果与实际细菌组成具有高度相关性。此外,在对真实微生物组样本(如唾液和肠道微生物组样本)的分析中,我们的方法成功地进行了物种水平的分析,并确定了不同表型组之间细菌组成的差异。
Front Microbiol. 2020-11-12
mSphere. 2021-2-24
Environ Microbiome. 2025-6-11
Microbiol Spectr. 2025-6-3
BMC Bioinformatics. 2025-2-27
Curr Microbiol. 2022-2-14
Forensic Sci Int Genet. 2020-5
PeerJ. 2018-6-12