Suppr超能文献

公共数据库中被低估的伪序列扭曲了生物固氮菌的宏基因组研究结果。

Undervalued Pseudo- Sequences in Public Databases Distort Metagenomic Insights into Biological Nitrogen Fixers.

机构信息

National Institute of Advanced Industrial Science and Technology (AIST) Hokkaido, Sapporo, Hokkaido, Japan.

Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyogrid.26999.3d, Tokyo, Japan.

出版信息

mSphere. 2021 Dec 22;6(6):e0078521. doi: 10.1128/msphere.00785-21. Epub 2021 Nov 17.

Abstract

Nitrogen fixation, a distinct process incorporating the inactive atmospheric nitrogen into the active biological processes, has been a major topic in biological and geochemical studies. Currently, insights into diversity and distribution of nitrogen-fixing microbes are dependent upon homology-based analyses of nitrogenase genes, especially the gene, which are broadly conserved in nitrogen-fixing microbes. Here, we report the pitfall of using as a marker of microbial nitrogen fixation. We exhaustively analyzed genomes in RefSeq (231,908 genomes) and KEGG (6,509 genomes) and cooccurrence and gene order patterns of nitrogenase genes (including ) therein. Up to 20% of -harboring genomes lacked and , which encode essential subunits of nitrogenase, within 10 coding sequences upstream or downstream of or on the same genome. According to a phenotypic database of prokaryotes, no species and strains harboring only possess nitrogen-fixing activities, which shows that these genes are "pseudo"- genes. Pseudo- sequences mainly belong to anaerobic microbes, including members of the class and methanogens. We also detected many pseudo- reads from metagenomic sequences of anaerobic environments such as animal guts, wastewater, paddy soils, and sediments. In some samples, pseudo- overwhelmed the number of "true" reads by 50% or 10 times. Because of the high sequence similarity between pseudo- and true-, pronounced amounts of -like reads were not confidently classified. Overall, our results encourage reconsideration of the conventional use of for detecting nitrogen-fixing microbes, while suggesting that or would be a more reliable marker. Nitrogen-fixing microbes affect biogeochemical cycling, agricultural productivity, and microbial ecosystems, and their distributions have been investigated intensively using genomic and metagenomic sequencing. Currently, insights into nitrogen fixers in the environment have been acquired by homology searches against nitrogenase genes, particularly the gene, in public databases. Here, we report that public databases include a significant amount of incorrectly annotated sequences (pseudo-). We exhaustively investigated the genomic structures of -harboring genomes and found hundreds of pseudo- sequences in RefSeq and KEGG. Over half of these pseudo- sequences belonged to members of the class , which is supposed to be a prominent nitrogen-fixing clade. We also found that the abundance of nitrogen fixers in metagenomes could be overestimated by 1.5 to >10 times due to pseudo- recorded in public databases. Our results encourage reconsideration of the prevalent use of as a marker of nitrogen-fixing microbes.

摘要

固氮作用,即将非活性大气氮纳入生物活性过程的独特过程,一直是生物和地球化学研究的主要课题。目前,对固氮微生物多样性和分布的了解依赖于氮酶基因的同源性分析,特别是在固氮微生物中广泛保守的基因。在这里,我们报告了使用作为微生物固氮标记的陷阱。我们详尽地分析了 RefSeq(231908 个基因组)和 KEGG(6509 个基因组)中的基因组以及氮酶基因(包括)的共现和基因顺序模式。高达 20%的含基因的基因组在基因或基因的上下游 10 个编码序列内缺乏编码氮酶必需亚基的基因和基因,或在同一基因组上。根据原核生物的表型数据库,没有仅携带基因的物种和菌株具有固氮活性,这表明这些基因是“伪”基因。伪基因序列主要属于厌氧微生物,包括纲和产甲烷菌的成员。我们还从动物肠道、废水、稻田和沉积物等厌氧环境的宏基因组序列中检测到许多伪序列。在一些样本中,伪序列的数量超过了“真”序列的 50%或 10 倍。由于伪序列和真序列之间的高度序列相似性,没有对大量的类基因序列进行准确分类。总的来说,我们的结果鼓励重新考虑传统上使用来检测固氮微生物的方法,同时表明或基因将是一个更可靠的标记。固氮微生物影响生物地球化学循环、农业生产力和微生物生态系统,其分布已通过基因组和宏基因组测序进行了深入研究。目前,通过在公共数据库中针对氮酶基因,特别是基因,进行同源搜索,已经获得了环境中固氮菌的见解。在这里,我们报告说,公共数据库中包含大量错误注释的基因序列(伪基因)。我们详尽地研究了含基因的基因组的结构,在 RefSeq 和 KEGG 中发现了数百个伪序列。这些伪序列中有一半以上属于纲的成员,而纲被认为是一个突出的固氮分支。我们还发现,由于公共数据库中记录的伪序列,宏基因组中固氮菌的丰度可能会高估 1.5 到> 10 倍。我们的结果鼓励重新考虑将作为固氮微生物标记的普遍使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aca9/8597730/2301f5d278bf/msphere.00785-21-f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验