• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

揭示自然蛋白质宇宙中的新家族和新折叠。

Uncovering new families and folds in the natural protein universe.

机构信息

Biozentrum, University of Basel, Basel, Switzerland.

SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland.

出版信息

Nature. 2023 Oct;622(7983):646-653. doi: 10.1038/s41586-023-06622-3. Epub 2023 Sep 13.

DOI:10.1038/s41586-023-06622-3
PMID:37704037
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10584680/
Abstract

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.

摘要

我们现在正进入蛋白质序列和结构注释的新时代,通过 AlphaFold 数据库提供了数亿个预测的蛋白质结构。这些模型几乎涵盖了所有已知的蛋白质,包括那些使用标准同源性方法难以注释功能或假定生物学作用的蛋白质。在这项研究中,我们研究了 AlphaFold 数据库在高预测精度下在多大程度上阐明了自然蛋白质宇宙中的这种“暗物质”。我们进一步描述了这些模型所涵盖的蛋白质多样性,作为一个带有注释的交互式序列相似性网络,可在 https://uniprot3d.org/atlas/AFDB90v4 访问。通过从序列、结构和语义角度搜索新颖性,我们发现了β-花折叠,向 Pfam 数据库添加了几个蛋白质家族,并通过实验证明其中一个属于一种新的翻译靶向毒素-抗毒素系统的超家族,即 TumE-TumA。这项工作强调了在识别、注释和优先考虑新蛋白质家族方面的大规模努力的价值。通过利用蛋白质生物信息学中的深度学习革命,我们现在可以以前所未有的规模揭示蛋白质宇宙中未知的领域,为生命科学和生物技术的创新铺平道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/fe6beb12ddbd/41586_2023_6622_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/d8452398ef69/41586_2023_6622_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/38c85e8946e6/41586_2023_6622_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/09f0e0ec1ce9/41586_2023_6622_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/a3350aefbe36/41586_2023_6622_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/4aefad19f072/41586_2023_6622_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/d23e3979f574/41586_2023_6622_Fig6_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/fc939090bdf1/41586_2023_6622_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/65d16ef3b15c/41586_2023_6622_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/386f40ea9822/41586_2023_6622_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/163529d49c42/41586_2023_6622_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/10e0da52a03f/41586_2023_6622_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/fe6beb12ddbd/41586_2023_6622_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/d8452398ef69/41586_2023_6622_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/38c85e8946e6/41586_2023_6622_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/09f0e0ec1ce9/41586_2023_6622_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/a3350aefbe36/41586_2023_6622_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/4aefad19f072/41586_2023_6622_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/d23e3979f574/41586_2023_6622_Fig6_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/fc939090bdf1/41586_2023_6622_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/65d16ef3b15c/41586_2023_6622_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/386f40ea9822/41586_2023_6622_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/163529d49c42/41586_2023_6622_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/10e0da52a03f/41586_2023_6622_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909d/10584680/fe6beb12ddbd/41586_2023_6622_Fig12_ESM.jpg

相似文献

1
Uncovering new families and folds in the natural protein universe.揭示自然蛋白质宇宙中的新家族和新折叠。
Nature. 2023 Oct;622(7983):646-653. doi: 10.1038/s41586-023-06622-3. Epub 2023 Sep 13.
2
Clustering predicted structures at the scale of the known protein universe.对已知蛋白质宇宙尺度的预测结构进行聚类。
Nature. 2023 Oct;622(7983):637-645. doi: 10.1038/s41586-023-06510-w. Epub 2023 Sep 13.
3
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.AlphaFold 蛋白质结构数据库:用高精度模型极大地扩展蛋白质序列空间的结构覆盖范围。
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. doi: 10.1093/nar/gkab1061.
4
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures.CATH 中的新功能家族(FunFams),以改进将保守功能位点映射到 3D 结构的工作。
Nucleic Acids Res. 2013 Jan;41(Database issue):D490-8. doi: 10.1093/nar/gks1211. Epub 2012 Nov 29.
5
The PAS fold. A redefinition of the PAS domain based upon structural prediction.PAS结构域。基于结构预测对PAS结构域的重新定义。
Eur J Biochem. 2004 Mar;271(6):1198-208. doi: 10.1111/j.1432-1033.2004.04023.x.
6
Using deep learning to annotate the protein universe.利用深度学习标注蛋白质宇宙。
Nat Biotechnol. 2022 Jun;40(6):932-937. doi: 10.1038/s41587-021-01179-w. Epub 2022 Feb 21.
7
Exploration of uncharted regions of the protein universe.探索蛋白质宇宙的未知领域。
PLoS Biol. 2009 Sep;7(9):e1000205. doi: 10.1371/journal.pbio.1000205. Epub 2009 Sep 29.
8
SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.SUPFAM——一个通过比较基于序列和基于结构的家族而得出的潜在蛋白质超家族关系数据库:对结构基因组学和基因组功能注释的意义。
Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.
9
Protein folds and families: sequence and structure alignments.蛋白质折叠与家族:序列和结构比对
Nucleic Acids Res. 1999 Jan 1;27(1):244-7. doi: 10.1093/nar/27.1.244.
10
Sequence-structure-function relationships in the microbial protein universe.微生物蛋白质宇宙中的序列-结构-功能关系。
Nat Commun. 2023 Apr 26;14(1):2351. doi: 10.1038/s41467-023-37896-w.

引用本文的文献

1
Large protein databases reveal structural complementarity and functional locality.大型蛋白质数据库揭示了结构互补性和功能局部性。
Nat Commun. 2025 Aug 25;16(1):7925. doi: 10.1038/s41467-025-63250-3.
2
Protein functional site annotation using local structure embeddings.利用局部结构嵌入进行蛋白质功能位点注释。
Proc Natl Acad Sci U S A. 2025 Aug 26;122(34):e2513219122. doi: 10.1073/pnas.2513219122. Epub 2025 Aug 20.
3
AlphaCD: a machine learning model capable of highly accurate characterization for 21,335 cytidine deaminases.

本文引用的文献

1
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
2
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
3
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms.AlphaFold2 揭示了 21 个模式生物的蛋白质结构空间中的共性和新颖性。
AlphaCD:一种能够对21335种胞嘧啶脱氨酶进行高精度表征的机器学习模型。
Cell Res. 2025 Aug 18. doi: 10.1038/s41422-025-01164-x.
4
Deciphering the proteome of K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins.解析K-12的蛋白质组:整合转录组学与机器学习以注释假设蛋白质。
Comput Struct Biotechnol J. 2025 Jul 24;27:3565-3578. doi: 10.1016/j.csbj.2025.07.036. eCollection 2025.
5
The topological properties of the protein universe.蛋白质世界的拓扑性质。
Nat Commun. 2025 Aug 13;16(1):7503. doi: 10.1038/s41467-025-61108-2.
6
Cyanobacteria and Soil Restoration: Bridging Molecular Insights with Practical Solutions.蓝细菌与土壤修复:将分子见解与实际解决方案相联系
Microorganisms. 2025 Jun 24;13(7):1468. doi: 10.3390/microorganisms13071468.
7
Hydrogel particle-based protein display enabled by particle-templated emulsification.基于颗粒模板乳化的水凝胶颗粒蛋白展示技术。
RSC Adv. 2025 Jul 23;15(32):26362-26370. doi: 10.1039/d5ra03622d. eCollection 2025 Jul 21.
8
A highly potent human antibody neutralizing all serotypes of BK polyomavirus.一种高效的可中和BK多瘤病毒所有血清型的人源抗体。
PLoS Pathog. 2025 Jul 18;21(7):e1013122. doi: 10.1371/journal.ppat.1013122. eCollection 2025 Jul.
9
The role of metabolism in shaping enzyme structures over 400 million years.新陈代谢在塑造超过4亿年的酶结构过程中的作用。
Nature. 2025 Jul 9. doi: 10.1038/s41586-025-09205-6.
10
Tracing the function expansion for a primordial protein fold in the era of fold-based function prediction: β-trefoil.在基于折叠的功能预测时代追溯原始蛋白质折叠的功能扩展:β-三叶因子。
PLoS One. 2025 Jul 3;20(7):e0320177. doi: 10.1371/journal.pone.0320177. eCollection 2025.
Commun Biol. 2023 Feb 8;6(1):160. doi: 10.1038/s42003-023-04488-9.
4
MGnify: the microbiome sequence data analysis resource in 2023.MGnify:2023 年的微生物组序列数据分析资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D753-D759. doi: 10.1093/nar/gkac1080.
5
InterPro in 2022.InterPro 在 2022 年。
Nucleic Acids Res. 2023 Jan 6;51(D1):D418-D427. doi: 10.1093/nar/gkac993.
6
A structural biology community assessment of AlphaFold2 applications.AlphaFold2 应用的结构生物学社区评估。
Nat Struct Mol Biol. 2022 Nov;29(11):1056-1067. doi: 10.1038/s41594-022-00849-w. Epub 2022 Nov 7.
7
PGRS domain structures: Doomed to sail the mycomembrane.PGRS 结构域:注定要在菌膜上航行。
PLoS Pathog. 2022 Sep 1;18(9):e1010760. doi: 10.1371/journal.ppat.1010760. eCollection 2022 Sep.
8
US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes.US-align:蛋白质、核酸和大分子复合物的通用结构比对。
Nat Methods. 2022 Sep;19(9):1109-1115. doi: 10.1038/s41592-022-01585-1. Epub 2022 Aug 29.
9
A hyperpromiscuous antitoxin protein domain for the neutralization of diverse toxin domains.一种超多功能解毒蛋白结构域,可中和多种毒素结构域。
Proc Natl Acad Sci U S A. 2022 Feb 8;119(6). doi: 10.1073/pnas.2102212119.
10
Biology and evolution of bacterial toxin-antitoxin systems.细菌毒素-抗毒素系统的生物学与进化。
Nat Rev Microbiol. 2022 Jun;20(6):335-350. doi: 10.1038/s41579-021-00661-1. Epub 2022 Jan 2.