• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

统一已知和未知的微生物编码序列空间。

Unifying the known and unknown microbial coding sequence space.

机构信息

Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine Microbiology, Bremen, Germany.

Jacobs University Bremen, Bremen, Germany.

出版信息

Elife. 2022 Mar 31;11:e67667. doi: 10.7554/eLife.67667.

DOI:10.7554/eLife.67667
PMID:35356891
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9132574/
Abstract

Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for . Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.

摘要

未知功能的基因是分子生物学,尤其是微生物系统中最大的挑战之一,在微生物系统中,预测的基因中有 40-60%是未知的。尽管之前已经做过尝试,但仍缺乏将未知部分纳入分析工作流程的系统方法。在这里,我们提出了一个概念框架,将其转化为计算工作流程 AGNOSTOS,并展示了如何在基因组和宏基因组中弥合已知和未知的差距。通过分析从 1749 个宏基因组和 28941 个细菌和古菌基因组中预测的 415971742 个基因,我们量化了未知部分的程度、其多样性以及在多个生物体和环境中的相关性。未知序列空间非常多样化,在系统发育上比已知部分更保守,并且主要在物种水平上具有分类限制。在确定的 7100 万个未知功能的基因中,我们为 Patescibacteria(也称为候选门辐射,CPR)汇编了一个包含 283874 个未知功能的谱系特异性基因的集合,这为扩展我们对其不寻常生物学的理解提供了重要资源。最后,通过确定一个未知功能的抗生素抗性靶基因,我们展示了如何能够生成可以用来补充实验数据的假设。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/2e6a2e25d7ae/elife-67667-app12-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/fb84f373e970/elife-67667-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/8d16cb9924d1/elife-67667-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/5b0086eeee28/elife-67667-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/311c9d7b0bb9/elife-67667-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/cd6b5034d571/elife-67667-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/708b76bd92a5/elife-67667-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/b3806caddf64/elife-67667-app1-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/a8336f711bf7/elife-67667-app1-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/a6bc2efd5b7c/elife-67667-app1-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/19f8cdae334c/elife-67667-app1-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/9fb8fa3daba4/elife-67667-app1-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/563fb703abec/elife-67667-app1-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/8675ef9fdb83/elife-67667-app1-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/56a3c5943382/elife-67667-app1-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/31afcf2b7de5/elife-67667-app3-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/52d58a72bef7/elife-67667-app5-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/887cf2add994/elife-67667-app5-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/ade9289768a6/elife-67667-app7-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/e37161dd02d7/elife-67667-app7-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/9a56d09778b4/elife-67667-app7-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/e8f4641e06b1/elife-67667-app7-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/e4df6e00b0a1/elife-67667-app9-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/d60552612a22/elife-67667-app10-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/cdce5cb83b52/elife-67667-app11-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/2e6a2e25d7ae/elife-67667-app12-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/fb84f373e970/elife-67667-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/8d16cb9924d1/elife-67667-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/5b0086eeee28/elife-67667-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/311c9d7b0bb9/elife-67667-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/cd6b5034d571/elife-67667-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/708b76bd92a5/elife-67667-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/b3806caddf64/elife-67667-app1-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/a8336f711bf7/elife-67667-app1-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/a6bc2efd5b7c/elife-67667-app1-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/19f8cdae334c/elife-67667-app1-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/9fb8fa3daba4/elife-67667-app1-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/563fb703abec/elife-67667-app1-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/8675ef9fdb83/elife-67667-app1-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/56a3c5943382/elife-67667-app1-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/31afcf2b7de5/elife-67667-app3-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/52d58a72bef7/elife-67667-app5-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/887cf2add994/elife-67667-app5-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/ade9289768a6/elife-67667-app7-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/e37161dd02d7/elife-67667-app7-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/9a56d09778b4/elife-67667-app7-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/e8f4641e06b1/elife-67667-app7-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/e4df6e00b0a1/elife-67667-app9-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/d60552612a22/elife-67667-app10-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/cdce5cb83b52/elife-67667-app11-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38e6/9132574/2e6a2e25d7ae/elife-67667-app12-fig1.jpg

相似文献

1
Unifying the known and unknown microbial coding sequence space.统一已知和未知的微生物编码序列空间。
Elife. 2022 Mar 31;11:e67667. doi: 10.7554/eLife.67667.
2
3
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life.近 8000 个宏基因组组装基因组的恢复极大地扩展了生命之树。
Nat Microbiol. 2017 Nov;2(11):1533-1542. doi: 10.1038/s41564-017-0012-7. Epub 2017 Sep 11.
4
A genome-phenome association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA.在本土微生物组中的基因组-表型关联研究中发现了一种在 DNA 和 RNA 中胞嘧啶修饰的机制。
Elife. 2021 Nov 8;10:e70021. doi: 10.7554/eLife.70021.
5
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
6
7
8
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
9
10
The economical lifestyle of CPR bacteria in groundwater allows little preference for environmental drivers.地下水中CPR细菌的经济生活方式使其对环境驱动因素几乎没有偏好。
Environ Microbiome. 2021 Dec 14;16(1):24. doi: 10.1186/s40793-021-00395-w.

引用本文的文献

1
Analysis of metagenomic data.宏基因组数据的分析
Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.
2
Extensive data mining uncovers novel diversity among members of the rare biosphere within the Thermoplasmatota.广泛的数据挖掘揭示了嗜热栖热菌门稀有生物圈成员之间新的多样性。
Microbiome. 2025 Jul 1;13(1):155. doi: 10.1186/s40168-025-02140-8.
3
New groups of highly divergent proteins in families as old as cellular life with important biological functions in the ocean.在与细胞生命一样古老的家族中,出现了新的高度分化的蛋白质群体,它们在海洋中具有重要的生物学功能。

本文引用的文献

1
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
2
Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean.阳光照射的海洋中丰富的远缘真核浮游生物谱系的功能库趋同。
Cell Genom. 2022 Apr 28;2(5):100123. doi: 10.1016/j.xgen.2022.100123. eCollection 2022 May 11.
3
Using deep learning to annotate the protein universe.利用深度学习标注蛋白质宇宙。
Environ Microbiome. 2025 Jun 11;20(1):65. doi: 10.1186/s40793-025-00697-3.
4
Annotating the microbial dark matter with HiFi-NN.用HiFi-NN注释微生物暗物质。
iScience. 2025 Apr 18;28(6):112480. doi: 10.1016/j.isci.2025.112480. eCollection 2025 Jun 20.
5
Microbes with higher metabolic independence are enriched in human gut microbiomes under stress.在压力状态下,具有更高代谢独立性的微生物在人类肠道微生物群中富集。
Elife. 2025 May 16;12:RP89862. doi: 10.7554/eLife.89862.
6
PRESCOTT: a population aware, epistatic, and structural model accurately predicts missense effects.普雷斯科特:一种群体感知、上位性和结构模型能准确预测错义效应。
Genome Biol. 2025 May 6;26(1):113. doi: 10.1186/s13059-025-03581-y.
7
Adaptive adjustment of profile HMM significance thresholds improves functional and metabolic insights into microbial genomes.轮廓隐马尔可夫模型显著性阈值的自适应调整改善了对微生物基因组的功能和代谢洞察。
Bioinform Adv. 2025 Mar 21;5(1):vbaf039. doi: 10.1093/bioadv/vbaf039. eCollection 2025.
8
Metatranscriptomes-based sequence similarity networks uncover genetic signatures within parasitic freshwater microbial eukaryotes.基于宏转录组的序列相似性网络揭示了寄生淡水微生物真核生物中的遗传特征。
Microbiome. 2025 Feb 6;13(1):43. doi: 10.1186/s40168-024-02027-0.
9
zol and fai: large-scale targeted detection and evolutionary investigation of gene clusters.佐尔和法伊:基因簇的大规模靶向检测与进化研究
Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf045.
10
Functional profiling of the sequence stockpile: a protein pair-based assessment of in silico prediction tools.序列储备的功能分析:基于蛋白质对的计算机预测工具评估
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf035.
Nat Biotechnol. 2022 Jun;40(6):932-937. doi: 10.1038/s41587-021-01179-w. Epub 2022 Feb 21.
4
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
5
Balrog: A universal protein model for prokaryotic gene prediction.巴尔罗格:用于原核基因预测的通用蛋白质模型。
PLoS Comput Biol. 2021 Feb 26;17(2):e1008727. doi: 10.1371/journal.pcbi.1008727. eCollection 2021 Feb.
6
Rapid discovery of novel prophages using biological feature engineering and machine learning.利用生物特征工程和机器学习快速发现新型原噬菌体
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa109. doi: 10.1093/nargab/lqaa109. eCollection 2021 Mar.
7
Community-led, integrated, reproducible multi-omics with anvi'o.社区主导的、集成的、可重复的多组学分析,使用 anvi'o 软件。
Nat Microbiol. 2021 Jan;6(1):3-6. doi: 10.1038/s41564-020-00834-3.
8
A unified catalog of 204,938 reference genomes from the human gut microbiome.人类肠道微生物组 204938 个参考基因组的统一目录。
Nat Biotechnol. 2021 Jan;39(1):105-114. doi: 10.1038/s41587-020-0603-3. Epub 2020 Jul 20.
9
Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity.利用空间图分类器(spacegraphcats)探索大型宏基因组组装图中的群落,揭示隐藏的序列多样性。
Genome Biol. 2020 Jul 6;21(1):164. doi: 10.1186/s13059-020-02066-4.
10
Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank.终止污染:大规模搜索在 GenBank 中发现超过 200 万条污染条目。
Genome Biol. 2020 May 12;21(1):115. doi: 10.1186/s13059-020-02023-1.