Suppr超能文献

KSGP 3.1:使用LotuS2、基因组分类数据库和RNAseq数据改进古菌群落的分类注释。

KSGP 3.1: improved taxonomic annotation of Archaea communities using LotuS2, the genome taxonomy database and RNAseq data.

作者信息

Grant Alastair, Aleidan Abdullah, Davies Charli S, Udochi Solomon C, Fritscher Joachim, Bahram Mohammad, Hildebrand Falk

机构信息

School of Environmental Science, University of East Anglia, Norwich NR4 7TJ, United Kingdom.

Zoology Department, College of Sciences, King Saud University, Riyadh 11451, Saudi Arabia.

出版信息

ISME Commun. 2025 Jun 3;5(1):ycaf094. doi: 10.1093/ismeco/ycaf094. eCollection 2025 Jan.

Abstract

Taxonomic annotation is a substantial challenge for Archaea metabarcoding. A limited number of reference sequences are available; a substantial fraction of phylogenetic diversity is not fully characterized; widely used databases do not reflect current archaeal taxonomy and contain mislabelled sequences. We address these gaps with a systematic and tractable approach based around the Genome Taxonomy Database (GTDB) combined with the eukaryote PR2 and MIDORI mitochondrial databases. After removing incongruent, chimeric and duplicate SSU sequences, this combination () provides a small improvement in annotation of a set of estuarine Archaea Operational Taxonomic Units (OTUs) compared to SILVA. We add to this a collection of near full length rRNA sequences and the prokaryote SSU sequences in SILVA, creating a new reference database, KSGP ( arst, ilva, TDB, and R2). The additional sequences are (re-)annotated using three different approaches. The most conservative, using lowest common ancestor, gives a further small improvement. Annotation using SINTAX increases Class and Order assignments by 2.7 and 4.2 times over SILVA, although this may include some "lumping" of un-named and named clades. Still further improvement can be made using similarity based clustering to group database sequences into putative taxa at all taxonomic levels, assigning 60% and 41% of Archaea OTUs to putative family and genus level taxa respectively. GTDB without cleaning and GreenGenes2 both perform poorly and cannot be recommended for use with Archaea. We make the GTDB+ and KSGP databases available at ksgp.earlham.ac.uk; integrate them into a metabarcoding pipeline, LotuS2 and outline their use to annotate Archaea OTUs and metatranscriptomic data.

摘要

分类注释对于古菌元条形码分析来说是一项重大挑战。可用的参考序列数量有限;相当一部分系统发育多样性尚未得到充分表征;广泛使用的数据库并未反映当前的古菌分类法,且包含错误标注的序列。我们采用一种系统且易于处理的方法来解决这些差距,该方法围绕基因组分类数据库(GTDB)展开,并结合了真核生物的PR2和MIDORI线粒体数据库。在去除不一致、嵌合和重复的小亚基(SSU)序列后,与SILVA相比,这种组合在一组河口古菌操作分类单元(OTU)的注释方面有小幅改进。我们在此基础上添加了一组近乎全长的rRNA序列以及SILVA中的原核生物SSU序列,创建了一个新的参考数据库KSGP(取自GTDB、SILVA、PR2和R2)。使用三种不同方法对这些额外的序列进行了(重新)注释。最保守的方法是使用最低共同祖先,这带来了进一步的小幅改进。使用SINTAX进行注释时,与SILVA相比,纲和目的分类增加了2.7倍和4.2倍,尽管这可能包括一些未命名和已命名进化枝的“合并”。使用基于相似性的聚类方法在所有分类水平上对数据库序列进行分组以形成假定的分类单元,分别将60%和41%的古菌OTU分配到假定的科和属水平分类单元,可实现更大的改进。未经过清理的GTDB和GreenGenes2表现都很差,不建议用于古菌分析。我们在ksgp.earlham.ac.uk上提供了GTDB+和KSGP数据库;将它们整合到一个元条形码分析流程LotuS2中,并概述了它们在注释古菌OTU和宏转录组数据方面的用途。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d20c/12203549/5dee7b085f60/ycaf094ga1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验