一种用于16S rRNA基因序列的贝叶斯分类方法，具有更高的物种水平准确性。

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

作者信息

Gao Xiang, Lin Huaiying, Revanna Kashi, Dong Qunfeng

机构信息

Department of Public Health Sciences, Loyola University Chicago Health Sciences Division, Maywood, IL, 60153, USA.

Center for Biomedical Informatics, Loyola University Chicago Health Sciences Division, Maywood, IL, 60153, USA.

出版信息

BMC Bioinformatics. 2017 May 10;18(1):247. doi: 10.1186/s12859-017-1670-4.

DOI:10.1186/s12859-017-1670-4

PMID:28486927

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5424349/

Abstract

BACKGROUND

Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement.

RESULTS

We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes.

CONCLUSIONS

Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

摘要

背景

对于微生物组研究人员而言，16S rRNA基因序列的物种水平分类仍然是一项严峻挑战，因为现有的16S rRNA基因序列分类工具要么不提供物种水平分类，要么其分类结果不可靠。结果不可靠是由于现有方法存在局限性，这些方法要么缺乏基于可靠概率的标准来评估其分类归属的可信度，要么使用核苷酸k-mer频率作为序列相似性测量的替代指标。

结果

我们开发了一种方法，与现有方法相比，该方法在物种水平分类结果上有显著改进。我们的方法使用成对序列比对来计算查询序列与数据库匹配序列之间的真实序列相似性。基于每个查询序列的多个数据库匹配的最低共同祖先，从物种到门水平进行分类归属，并通过自展置信度得分评估进一步的分类可靠性。我们方法的新颖之处在于，每个数据库匹配对查询序列分类归属的贡献通过基于数据库匹配与查询序列的序列相似性程度的贝叶斯后验概率进行加权。我们的方法不需要针对不同分类组的任何训练数据集。相反，只需要一个参考数据库来与查询序列进行比对，这使得我们的方法易于应用于16S rRNA基因的不同区域或其他系统发育标记基因。

结论

对16S rRNA或其他系统发育标记基因进行可靠的物种水平分类对于微生物组研究至关重要。我们的软件显示出比现有工具显著更高的分类准确性，并且我们基于多个数据库与查询序列的匹配提供基于概率的置信度得分，以评估我们分类归属的可靠性。尽管计算成本较高，但我们的方法仍然适用于实际分析大规模微生物组数据集。此外，我们的方法可应用于任何系统发育标记基因序列的分类。我们的软件名为BLCA，可在https://github.com/qunfengdong/BLCA上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e27/5424349/8855da7f5c80/12859_2017_1670_Fig1_HTML.jpg

相似文献

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

BMC Bioinformatics. 2017 May 10;18(1):247. doi: 10.1186/s12859-017-1670-4.

Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets.

Microbiome. 2020 May 15;8(1):65. doi: 10.1186/s40168-020-00841-w.

TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution.

mSphere. 2018 Sep 5;3(5):e00327-18. doi: 10.1128/mSphere.00327-18.

RNA polymerase beta subunit (rpoB) gene and the 16S-23S rRNA intergenic transcribed spacer region (ITS) as complementary molecular markers in addition to the 16S rRNA gene for phylogenetic analysis and identification of the species of the family Mycoplasmataceae.

Mol Phylogenet Evol. 2012 Jan;62(1):515-28. doi: 10.1016/j.ympev.2011.11.002. Epub 2011 Nov 17.

bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences.

J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

Microbiome. 2018 May 17;6(1):90. doi: 10.1186/s40168-018-0470-z.

Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data.

Indian J Med Res. 2020 Jan;151(1):93-103. doi: 10.4103/ijmr.IJMR_220_18.

SpeciateIT and vSpeciateDB: novel, fast, and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota.

BMC Bioinformatics. 2024 Sep 27;25(1):313. doi: 10.1186/s12859-024-05930-3.

Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database.

BMC Genomics. 2015 Dec 12;16:1056. doi: 10.1186/s12864-015-2265-y.

Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

Appl Environ Microbiol. 2007 Aug;73(16):5261-7. doi: 10.1128/AEM.00062-07. Epub 2007 Jun 22.

引用本文的文献

Evidence for an indigenous female mouse urobiome.

bioRxiv. 2025 Aug 23:2025.08.20.671418. doi: 10.1101/2025.08.20.671418.

Benchmarking 16S rRNA Gene-Based Approaches to Bacterial Taxonomy Assignment Based on Amplicon Sequencing With Illumina and Oxford Nanopore.

Int J Microbiol. 2025 Aug 13;2025:7563096. doi: 10.1155/ijm/7563096. eCollection 2025.

Abundance and Diversity of Aerobic Anoxygenic Phototrophic Bacteria in Polar Plant Microbiomes.

Physiol Plant. 2025 Jul-Aug;177(4):e70441. doi: 10.1111/ppl.70441.

Mitochondrial markers ( and ) as supporting biomarkers for wild bird identification.

Vet World. 2025 May;18(5):1389-1399. doi: 10.14202/vetworld.2025.1389-1399. Epub 2025 May 31.

ACE inhibitory casein peptide lowers blood pressure and reshapes gut microbiota in a randomized double blind placebo controlled trial.

Sci Rep. 2025 Apr 22;15(1):13840. doi: 10.1038/s41598-025-98446-6.

Soil biome variation of Lupinus nipomensis in wet-cool vs. dry-warm microhabitats and greenhouse.

Am J Bot. 2025 Apr;112(4):e70020. doi: 10.1002/ajb2.70020. Epub 2025 Mar 21.

Bacterial Supplements Significantly Improve the Growth Rate of Cultured Asparagopsis armata.

Mar Biotechnol (NY). 2025 Mar 14;27(2):65. doi: 10.1007/s10126-025-10440-1.

Streptococcus lutetiensis inhibits CD8 IL17A TRM cells and leads to gastric cancer progression and poor prognosis.

NPJ Precis Oncol. 2025 Feb 9;9(1):43. doi: 10.1038/s41698-025-00810-2.

Metagenomic analysis and bioactive profiling of kombucha fermentation: antioxidant, antibacterial activities, and molecular docking insights into gastric cancer therapeutics.

Toxicol Res (Camb). 2024 Dec 21;13(6):tfae224. doi: 10.1093/toxres/tfae224. eCollection 2024 Dec.

Integrative multi-omics analysis uncovers tumor-immune-gut axis influencing immunotherapy outcomes in ovarian cancer.

Nat Commun. 2024 Dec 5;15(1):10609. doi: 10.1038/s41467-024-54565-8.

本文引用的文献

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.

Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.

Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

J Microbiol Methods. 2016 Mar;122:38-42. doi: 10.1016/j.mimet.2016.01.011. Epub 2016 Jan 23.

An evaluation of the accuracy and speed of metagenome analysis tools.

Sci Rep. 2016 Jan 18;6:19233. doi: 10.1038/srep19233.

SPINGO: a rapid species-classifier for microbial amplicon sequences.

BMC Bioinformatics. 2015 Oct 8;16:324. doi: 10.1186/s12859-015-0747-1.

16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets.

PLoS One. 2015 Feb 3;10(2):e0116106. doi: 10.1371/journal.pone.0116106. eCollection 2015.

Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition.

Genome Biol. 2014 Jun 27;15(6):R76. doi: 10.1186/gb-2014-15-6-r76.

Kraken: ultrafast metagenomic sequence classification using exact alignments.

Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.

Species-level classification of the vaginal microbiome.

BMC Genomics. 2012;13 Suppl 8(Suppl 8):S17. doi: 10.1186/1471-2164-13-S8-S17. Epub 2012 Dec 17.

A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

BMC Genomics. 2012 Jul 24;13:341. doi: 10.1186/1471-2164-13-341.

QIIME allows analysis of high-throughput community sequencing data.

Nat Methods. 2010 May;7(5):335-6. doi: 10.1038/nmeth.f.303. Epub 2010 Apr 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于16S rRNA基因序列的贝叶斯分类方法，具有更高的物种水平准确性。

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献