Suppr超能文献

比较使用经过整理的 16S 全长 rRNA 序列的原核分类器的性能。

To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences.

机构信息

Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.

Institute of Biotechnology, National Taiwan University, Taipei, Taiwan.

出版信息

Comput Biol Med. 2022 Jun;145:105416. doi: 10.1016/j.compbiomed.2022.105416. Epub 2022 Mar 17.

Abstract

BACKGROUND

Taxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear.

OBJECTIVE

The purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier.

METHODS

Both curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models.

RESULTS

The accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for the task of classifying prokaryotic 16S full-length rRNA sequences.

CONCLUSION

The performance of the classifiers was affected by sequence training datasets. Therefore, different classifiers should use the most suitable 16S training data to improve the accuracy and taxonomy resolution in the taxonomic assignment.

摘要

背景

分类学分配是细菌 16S 核糖体 RNA(rRNA)测序分析管道中的重要步骤。在过去的十年中,该领域的大多数研究都使用下一代测序技术靶向 V3-V4 区域来分析细菌组成。然而,仅关注一个或两个高变区将分类分辨率限制在物种水平。近年来,第三代测序技术使研究人员能够轻松访问全长原核 16S 序列,并提供了获得更高分类深度的机会。然而,当前分类器在分析 16S 全长序列分析中的准确性尚不清楚。

目的

本研究的目的是比较几种广泛使用的 16S 序列分类器的准确性,并指出每个分类器最适合的 16S 训练数据集。

方法

使用经过策展的 16S 全长序列和交叉验证数据集来验证七种分类器的性能,包括 QIIME2、 mothur、SINTAX、SPINGO、核糖体数据库项目(RDP)、IDTAXA 和 Kraken2。使用不同的序列训练数据集,如 SILVA、Greengenes 和 RDP,来训练分类模型。

结果

说明了每个分类器对物种水平的准确性。根据实验结果,使用 RDP 序列作为训练数据,SINTAX 和 SPINGO 提供了最高的准确性,推荐用于分类原核 16S 全长 rRNA 序列。

结论

分类器的性能受到序列训练数据集的影响。因此,不同的分类器应使用最合适的 16S 训练数据,以提高分类学分配中的准确性和分类分辨率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验