Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine Microbiology, Bremen, Germany.
Jacobs University Bremen, Bremen, Germany.
Elife. 2022 Mar 31;11:e67667. doi: 10.7554/eLife.67667.
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for . Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.
未知功能的基因是分子生物学,尤其是微生物系统中最大的挑战之一,在微生物系统中,预测的基因中有 40-60%是未知的。尽管之前已经做过尝试,但仍缺乏将未知部分纳入分析工作流程的系统方法。在这里,我们提出了一个概念框架,将其转化为计算工作流程 AGNOSTOS,并展示了如何在基因组和宏基因组中弥合已知和未知的差距。通过分析从 1749 个宏基因组和 28941 个细菌和古菌基因组中预测的 415971742 个基因,我们量化了未知部分的程度、其多样性以及在多个生物体和环境中的相关性。未知序列空间非常多样化,在系统发育上比已知部分更保守,并且主要在物种水平上具有分类限制。在确定的 7100 万个未知功能的基因中,我们为 Patescibacteria(也称为候选门辐射,CPR)汇编了一个包含 283874 个未知功能的谱系特异性基因的集合,这为扩展我们对其不寻常生物学的理解提供了重要资源。最后,通过确定一个未知功能的抗生素抗性靶基因,我们展示了如何能够生成可以用来补充实验数据的假设。