Suppr超能文献

使用QIIME 2和RDP进行简单匹配揭示了参考数据集中错误识别的序列以及真菌代表性不足的问题。

Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets.

作者信息

Eldred Lauren E, Thorn R Greg, Smith David Roy

机构信息

Department of Biology, University of Western Ontario, London, ON, Canada.

出版信息

Front Genet. 2021 Nov 26;12:768473. doi: 10.3389/fgene.2021.768473. eCollection 2021.

Abstract

Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences.

摘要

简单的核苷酸匹配识别方法在识别环境真菌序列时并不像人们曾经认为的那样准确。这主要是因为命名错误以及参考数据集中各种真菌类群的代表性不足。在这里,我们通过检查担子菌门和基础真菌的部分大亚基rRNA序列的环境元条形码数据集来探讨这些问题。我们使用QIIME 2分类器和RDP分类器结合最新版本的SILVA(138.1,2020)和RDP(11,2014)参考数据集采用简单匹配方法,然后将结果与手动系统发育分类方法进行比较。在测试的71个查询序列中,分别有21%和42%使用QIIME 2和RDP分类器被错误识别。在这些简单匹配错误识别中,超过一半是由于SILVA和RDP参考数据集中各种真菌类群的代表性不足导致的。具有更少错误识别序列的更全面的参考数据集将提高简单匹配识别的准确性。然而,我们认为系统发育分类方法是比简单匹配更好的选择,因为除了更高的准确性外,它还提供了关于查询序列的进化信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb5/8662557/f7c2b7439344/fgene-12-768473-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验