Shen Qingrong, Fan Xiaoqian, Sun Yangyang, Gao Hao, Su Xiaoquan
College of Computer Science and Technology, Qingdao University, Qingdao, 266071, Shandong, China.
Shouguang Hospital of Traditional Chinese Medicine, Weifang, 262700, Shandong, China.
BMC Bioinformatics. 2025 May 26;26(1):136. doi: 10.1186/s12859-025-06156-7.
16S rRNA amplicon sequencing is a widely used method for microbiome composition analysis due to its cost-effectiveness and lower data requirements compared to metagenomic whole-genome sequencing (WGS). However, inherent limitations in 16S-based approach often lead to profiling discrepancies, particularly at the species level, compromising the accuracy and reliability of findings.
To address this issue, we present TaxaCal (Taxonomic Calibrator), a machine learning algorithm designed to calibrate species-level taxonomy profiles in 16S amplicon data using a two-tier correction strategy. Validation on in-house produced and public datasets shows that TaxaCal effectively reduces biases in amplicon sequencing, mitigating discrepancies between microbial profiles derived from 16S and WGS. Moreover, TaxaCal enables seamless cross-platform comparisons between these two sequencing approaches, significantly improving disease detection in 16S-based microbiome data.
Therefore, TaxaCal offers a cost-effective solution for generating high-resolution microbiome species profiles that closely align with WGS results, enhancing the utility of 16S-based profiling in microbiome research. As microbiome-based diagnostics continue to evolve, TaxaCal has the potential to be a crucial tool in advancing the utility of 16S sequencing in clinical and research settings.
16S rRNA扩增子测序是一种广泛用于微生物群落组成分析的方法,因为与宏基因组全基因组测序(WGS)相比,它具有成本效益且数据要求较低。然而,基于16S的方法存在固有的局限性,常常导致分析差异,尤其是在物种水平上,这会影响研究结果的准确性和可靠性。
为了解决这个问题,我们提出了TaxaCal(分类校准器),这是一种机器学习算法,旨在使用两层校正策略校准16S扩增子数据中的物种水平分类图谱。在内部生成的数据集和公共数据集上的验证表明,TaxaCal有效地减少了扩增子测序中的偏差,减轻了16S和WGS衍生的微生物图谱之间的差异。此外,TaxaCal能够在这两种测序方法之间进行无缝的跨平台比较,显著提高基于16S的微生物组数据中的疾病检测能力。
因此,TaxaCal提供了一种经济高效的解决方案,用于生成与WGS结果紧密匹配的高分辨率微生物组物种图谱,增强了基于16S的分析在微生物组研究中的实用性。随着基于微生物组的诊断不断发展,TaxaCal有潜力成为提高16S测序在临床和研究环境中实用性的关键工具。