折叠不可折叠之物：利用AlphaFold探索假蛋白

Folding the unfoldable: using AlphaFold to explore spurious proteins.

作者信息

Monzon Vivian, Haft Daniel H, Bateman Alex

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK.

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.

出版信息

Bioinform Adv. 2022 Jan 9;2(1):vbab043. doi: 10.1093/bioadv/vbab043. eCollection 2022.

DOI:10.1093/bioadv/vbab043

PMID:36699409

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9710616/

Abstract

MOTIVATION

The release of AlphaFold 2.0 has revolutionized our ability to determine protein structures from sequences. This tool also inadvertently opens up many unanticipated opportunities. In this article, we investigate the AntiFam resource, which contains 250 protein sequence families that we believe to be spurious protein translations. We would not expect proteins belonging to these families to fold into well-ordered globular structures. To test this hypothesis, we have attempted to computationally determine the structure of a representative sequence from all AntiFam 6.0 families.

RESULTS

Although the large majority of families showed no evidence of globular structure, we have identified one example for which a globular structure is predicted. Proteins in this AntiFam entry indeed seem likely to be proteins, based on additional considerations, and thus AlphaFold provides a useful quality control for the AntiFam database. Conversely, known spurious proteins offer useful set of quality controls for AlphaFold. We have identified a trend that the mean structure prediction confidence score pLDDT is higher for shorter sequences. Of the 131 AntiFam representative sequences <100 amino acids in length, AlphaFold predicts a mean pLDDT of 80 or greater for six of them. Thus, particular care should be taken when applying AlphaFold to short protein sequences.

AVAILABILITY AND IMPLEMENTATION

The AlphaFold predictions for representative sequences can be found at the following URL: https://drive.google.com/drive/folders/1u9OocRIAabGQn56GljoG1JTDAxjkY1ro.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

AlphaFold 2.0的发布彻底改变了我们从序列确定蛋白质结构的能力。这个工具也意外地带来了许多意想不到的机会。在本文中，我们研究了AntiFam资源，它包含250个蛋白质序列家族，我们认为这些是错误的蛋白质翻译。我们预计属于这些家族的蛋白质不会折叠成有序的球状结构。为了验证这一假设，我们试图通过计算确定所有AntiFam 6.0家族中一个代表性序列的结构。

结果

虽然绝大多数家族没有显示出球状结构的证据，但我们发现了一个预测有球状结构的例子。基于其他考虑，这个AntiFam条目中的蛋白质似乎确实可能是蛋白质，因此AlphaFold为AntiFam数据库提供了有用的质量控制。相反，已知的错误蛋白质为AlphaFold提供了一组有用的质量控制。我们发现了一个趋势，即较短序列的平均结构预测置信度得分pLDDT更高。在长度小于100个氨基酸的131个AntiFam代表性序列中，AlphaFold对其中6个序列预测的平均pLDDT为80或更高。因此，在将AlphaFold应用于短蛋白质序列时应格外小心。

可用性和实现方式

代表性序列的AlphaFold预测可在以下网址找到：https://drive.google.com/drive/folders/1u9OocRIAabGQn56GljoG1JTDAxjkY1ro。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e232/9710616/fe50755aea1f/vbab043f1.jpg

相似文献

Folding the unfoldable: using AlphaFold to explore spurious proteins.折叠不可折叠之物：利用AlphaFold探索假蛋白

Bioinform Adv. 2022 Jan 9;2(1):vbab043. doi: 10.1093/bioadv/vbab043. eCollection 2022.

Using AlphaFold to predict the impact of single mutations on protein stability and function.利用 AlphaFold 预测单突变对蛋白质稳定性和功能的影响。

PLoS One. 2023 Mar 16;18(3):e0282689. doi: 10.1371/journal.pone.0282689. eCollection 2023.

AntiFam: a tool to help identify spurious ORFs in protein annotation.AntiFam：一种帮助识别蛋白质注释中虚假开放阅读框的工具。

Database (Oxford). 2012 Mar 20;2012:bas003. doi: 10.1093/database/bas003. Print 2012.

Reciprocal best structure hits: using AlphaFold models to discover distant homologues.相互最佳结构命中：使用AlphaFold模型发现远源同源物。

Bioinform Adv. 2022 Oct 6;2(1):vbac072. doi: 10.1093/bioadv/vbac072. eCollection 2022.

Gene Unprediction with Spurio: A tool to identify spurious protein sequences.使用Spurio进行基因预测：一种识别虚假蛋白质序列的工具。

F1000Res. 2018 Mar 2;7:261. doi: 10.12688/f1000research.14050.1. eCollection 2018.

Impact of protein conformational diversity on AlphaFold predictions.蛋白质构象多样性对 AlphaFold 预测的影响。

Bioinformatics. 2022 May 13;38(10):2742-2748. doi: 10.1093/bioinformatics/btac202.

PANDA-3D: protein function prediction based on AlphaFold models.PANDA-3D：基于AlphaFold模型的蛋白质功能预测

NAR Genom Bioinform. 2024 Aug 6;6(3):lqae094. doi: 10.1093/nargab/lqae094. eCollection 2024 Sep.

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.AlphaFold 蛋白质结构数据库：用高精度模型极大地扩展蛋白质序列空间的结构覆盖范围。

Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. doi: 10.1093/nar/gkab1061.

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.2024 年的 AlphaFold 蛋白质结构数据库：为超过 2.14 亿个蛋白质序列提供结构覆盖。

Nucleic Acids Res. 2024 Jan 5;52(D1):D368-D375. doi: 10.1093/nar/gkad1011.

AlphaCutter: Efficient removal of non-globular regions from predicted protein structures.AlphaCutter：从预测的蛋白质结构中有效去除非球状区域。

Proteomics. 2023 Aug;23(16):e2300176. doi: 10.1002/pmic.202300176. Epub 2023 Jun 13.

引用本文的文献

Fold first, ask later: structure-informed function annotation of phage proteins.先折叠，后询问：噬菌体蛋白质的结构导向功能注释

bioRxiv. 2025 Jul 20:2025.07.17.665397. doi: 10.1101/2025.07.17.665397.

Investigating the role of long non-coding RNA in hypertrophic cardiomyopathy.研究长链非编码RNA在肥厚型心肌病中的作用。

bioRxiv. 2025 Jul 31:2025.07.26.666851. doi: 10.1101/2025.07.26.666851.

Multimeric protein interaction and complex prediction: Structure, dynamics and function.多聚体蛋白质相互作用与复合物预测：结构、动力学与功能

Comput Struct Biotechnol J. 2025 May 16;27:1975-1997. doi: 10.1016/j.csbj.2025.05.009. eCollection 2025.

Three-Dimensional Structural Heteromorphs of Mating-Type Proteins in and the Natural Insect-Fungal Complex.昆虫与真菌天然复合体中交配型蛋白的三维结构异形体

J Fungi (Basel). 2025 Mar 23;11(4):244. doi: 10.3390/jof11040244.

AlphaFold 2, but not AlphaFold 3, predicts confident but unrealistic β-solenoid structures for repeat proteins.AlphaFold 2能为重复蛋白预测出可信但不切实际的β-螺旋结构，而AlphaFold 3则不能。

Comput Struct Biotechnol J. 2025 Jan 22;27:467-477. doi: 10.1016/j.csbj.2025.01.016. eCollection 2025.

Bilingual language model for protein sequence and structure.用于蛋白质序列和结构的双语语言模型。

NAR Genom Bioinform. 2024 Nov 15;6(4):lqae150. doi: 10.1093/nargab/lqae150. eCollection 2024 Dec.

BFVD-a large repository of predicted viral protein structures.BFVD——一个预测病毒蛋白结构的大型数据库。

Nucleic Acids Res. 2025 Jan 6;53(D1):D340-D347. doi: 10.1093/nar/gkae1119.

Proteome-scale structural prediction of the giant Marseillevirus reveals conserved folds and putative homologs of the hypothetical proteins.巨马赛病毒的蛋白质组结构预测揭示了保守折叠和假定的假设蛋白同源物。

Arch Virol. 2024 Oct 16;169(11):222. doi: 10.1007/s00705-024-06155-8.

Insights into docking in megasynthases from the investigation of the toblerol -AT polyketide synthase: many α-helical means to an end.通过对托百乐 -AT 聚酮合酶的研究洞察大型合成酶中的对接：多种 α 螺旋通向一个目标。

RSC Chem Biol. 2024 May 16;5(7):669-683. doi: 10.1039/d4cb00075g. eCollection 2024 Jul 3.

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure.CHESS 3：基于大规模表达数据、系统发育分析和蛋白质结构，改进和综合的人类基因和转录本目录。

Genome Biol. 2023 Oct 30;24(1):249. doi: 10.1186/s13059-023-03088-4.

本文引用的文献

A structural biology community assessment of AlphaFold2 applications.AlphaFold2 应用的结构生物学社区评估。

Nat Struct Mol Biol. 2022 Nov;29(11):1056-1067. doi: 10.1038/s41594-022-00849-w. Epub 2022 Nov 7.

ColabFold: making protein folding accessible to all.ColabFold：让蛋白质折叠变得人人可用。

Nat Methods. 2022 Jun;19(6):679-682. doi: 10.1038/s41592-022-01488-1. Epub 2022 May 30.

AlphaFold and Implications for Intrinsically Disordered Proteins.AlphaFold 及其对无序蛋白质的影响。

J Mol Biol. 2021 Oct 1;433(20):167208. doi: 10.1016/j.jmb.2021.167208. Epub 2021 Aug 18.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.RefSeq：通过蛋白质家族模型编纂扩展原核生物基因组注释管道的覆盖范围。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1020-D1028. doi: 10.1093/nar/gkaa1105.

UniProt: the universal protein knowledgebase in 2021.UniProt：2021 年的通用蛋白质知识库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

Europe PMC in 2020.欧洲 PMC 于 2020 年。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1507-D1514. doi: 10.1093/nar/gkaa994.

The Onset of Tacrolimus Biosynthesis in Is Dependent on the Intracellular Redox Status.他克莫司生物合成的起始取决于细胞内的氧化还原状态。（注：原文句子不完整，“in”后面缺少具体内容）

Antibiotics (Basel). 2020 Oct 15;9(10):703. doi: 10.3390/antibiotics9100703.

Analyzing Protein Disorder with IUPred2A.用 IUPred2A 分析蛋白质无序性。

Curr Protoc Bioinformatics. 2020 Jun;70(1):e99. doi: 10.1002/cpbi.99.

The PSIPRED Protein Analysis Workbench: 20 years on.PSIPRED 蛋白质分析工作平台：20 年的发展

Nucleic Acids Res. 2019 Jul 2;47(W1):W402-W407. doi: 10.1093/nar/gkz297.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

折叠不可折叠之物：利用AlphaFold探索假蛋白

Folding the unfoldable: using AlphaFold to explore spurious proteins.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献