ProRefiner：一种基于信息熵的全局图注意力逆蛋白折叠细化策略。

ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention.

机构信息

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong, China.

Zhejiang Lab, Kechuang Avenue, Hangzhou, China.

出版信息

Nat Commun. 2023 Nov 16;14(1):7434. doi: 10.1038/s41467-023-43166-6.

DOI:10.1038/s41467-023-43166-6

PMID:37973874

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10654420/

Abstract

Inverse Protein Folding (IPF) is an important task of protein design, which aims to design sequences compatible with a given backbone structure. Despite the prosperous development of algorithms for this task, existing methods tend to rely on noisy predicted residues located in the local neighborhood when generating sequences. To address this limitation, we propose an entropy-based residue selection method to remove noise in the input residue context. Additionally, we introduce ProRefiner, a memory-efficient global graph attention model to fully utilize the denoised context. Our proposed method achieves state-of-the-art performance on multiple sequence design benchmarks in different design settings. Furthermore, we demonstrate the applicability of ProRefiner in redesigning Transposon-associated transposase B, where six out of the 20 variants we propose exhibit improved gene editing activity.

摘要

反向蛋白质折叠（Inverse Protein Folding，简称 IPF）是蛋白质设计的一项重要任务，旨在设计与给定骨架结构兼容的序列。尽管针对该任务的算法已经取得了蓬勃的发展，但现有方法在生成序列时往往依赖于局部邻域中预测的存在噪声的残基。为了解决这一局限性，我们提出了一种基于熵的残基选择方法，以去除输入残基环境中的噪声。此外，我们引入了 ProRefiner，这是一种内存高效的全局图注意力模型，可充分利用去噪后的上下文。我们提出的方法在不同设计环境下的多个序列设计基准测试中达到了最先进的性能。此外，我们还展示了 ProRefiner 在重新设计转座酶 B 中的应用，我们提出的 20 个变体中有 6 个表现出了提高的基因编辑活性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa1a/10654420/8eccaf3437c9/41467_2023_43166_Fig1_HTML.jpg

相似文献

ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention.ProRefiner：一种基于信息熵的全局图注意力逆蛋白折叠细化策略。

Nat Commun. 2023 Nov 16;14(1):7434. doi: 10.1038/s41467-023-43166-6.

Are residues in a protein folding nucleus evolutionarily conserved?蛋白质折叠核心中的残基在进化上保守吗？

J Mol Biol. 2004 Jan 23;335(4):869-80. doi: 10.1016/j.jmb.2003.11.007.

GraphGPSM: a global scoring model for protein structure using graph neural networks.GraphGPSM：一种使用图神经网络的蛋白质结构全局评分模型。

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad219.

Predicting protein β-sheet contacts using a maximum entropy-based correlated mutation measure.利用基于最大熵的关联突变度量预测蛋白质 β-折叠接触。

Bioinformatics. 2013 Mar 1;29(5):580-7. doi: 10.1093/bioinformatics/btt005. Epub 2013 Jan 10.

Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.CCMgen 通过合成蛋白比对量化残基残基接触预测中的噪声。

PLoS Comput Biol. 2018 Nov 5;14(11):e1006526. doi: 10.1371/journal.pcbi.1006526. eCollection 2018 Nov.

A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.一种检测蛋白质超家族中折叠核心残基保守性的策略。

Fold Des. 1998;3(4):239-51. doi: 10.1016/S1359-0278(98)00035-2.

The nature of the free energy barriers to two-state folding.两态折叠中自由能垒的性质。

Proteins. 2004 Oct 1;57(1):142-52. doi: 10.1002/prot.20172.

Prediction of optimal folding routes of proteins that satisfy the principle of lowest entropy loss: dynamic contact maps and optimal control.满足最低熵损失原则的蛋白质最优折叠路径的预测：动态接触图谱和最优控制。

PLoS One. 2010 Oct 12;5(10):e13275. doi: 10.1371/journal.pone.0013275.

Distributions in protein conformation space: implications for structure prediction and entropy.蛋白质构象空间中的分布：对结构预测和熵的影响。

Biophys J. 2004 Jul;87(1):113-20. doi: 10.1529/biophysj.104.041723.

[Optimal relationship between average conformational entropy and average energy of interactions between residues for fast protein folding].[快速蛋白质折叠中残基间平均构象熵与平均相互作用能之间的最佳关系]

Biofizika. 2006 Jul-Aug;51(4):622-32.

引用本文的文献

Machine learning-guided evolution of pyrrolysyl-tRNA synthetase for improved incorporation efficiency of diverse noncanonical amino acids.机器学习引导的吡咯赖氨酸-tRNA合成酶进化，以提高多种非标准氨基酸的掺入效率。

Nat Commun. 2025 Jul 19;16(1):6648. doi: 10.1038/s41467-025-61952-2.

DivPro: diverse protein sequence design with direct structure recovery guidance.DivPro：借助直接结构恢复指导进行多样化蛋白质序列设计。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i382-i390. doi: 10.1093/bioinformatics/btaf258.

Evolution-guided protein design of IscB for persistent epigenome editing in vivo.用于体内持续表观基因组编辑的IscB的进化引导蛋白质设计。

Nat Biotechnol. 2025 May 7. doi: 10.1038/s41587-025-02655-3.

Harnessing Deep Learning Methods for Voltage-Gated Ion Channel Drug Discovery.利用深度学习方法进行电压门控离子通道药物发现。

Physiology (Bethesda). 2025 Jan 1;40(1):0. doi: 10.1152/physiol.00029.2024. Epub 2024 Aug 27.

Context-aware geometric deep learning for protein sequence design.上下文感知几何深度学习在蛋白质序列设计中的应用。

Nat Commun. 2024 Jul 25;15(1):6273. doi: 10.1038/s41467-024-50571-y.

A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation.生成式人工智能在从头设计药物中的应用调查：分子和蛋白质生成的新前沿。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae338.

Computational tools for plant genomics and breeding.植物基因组学和育种的计算工具。

Sci China Life Sci. 2024 Aug;67(8):1579-1590. doi: 10.1007/s11427-024-2578-6. Epub 2024 Apr 23.

本文引用的文献

Rotamer-free protein sequence design based on deep learning and self-consistency.基于深度学习和自一致性的无旋转异构体蛋白质序列设计

Nat Comput Sci. 2022 Jul;2(7):451-462. doi: 10.1038/s43588-022-00273-6. Epub 2022 Jul 21.

Cryo-EM structure of the transposon-associated TnpB enzyme.转座子相关 TnpB 酶的冷冻电镜结构。

Nature. 2023 Apr;616(7956):390-397. doi: 10.1038/s41586-023-05933-9. Epub 2023 Apr 5.

Robust deep learning-based protein sequence design using ProteinMPNN.使用 ProteinMPNN 进行健壮的基于深度学习的蛋白质序列设计。

Science. 2022 Oct 7;378(6615):49-56. doi: 10.1126/science.add2187. Epub 2022 Sep 15.

Protein sequence design with a learned potential.利用学习到的势能进行蛋白质序列设计。

Nat Commun. 2022 Feb 8;13(1):746. doi: 10.1038/s41467-022-28313-9.

AlphaFold2-aware protein-DNA binding site prediction using graph transformer.基于图变换的 AlphaFold2 感知蛋白-DNA 结合位点预测。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab564.

Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease.转座子相关的 TnpB 是一种可编程的 RNA 引导的 DNA 内切酶。

Nature. 2021 Nov;599(7886):692-696. doi: 10.1038/s41586-021-04058-1. Epub 2021 Oct 7.

The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases.广泛存在的 IS200/IS605 转座子家族编码了多种可编程 RNA 指导的内切核酸酶。

Science. 2021 Oct;374(6563):57-65. doi: 10.1126/science.abj6856. Epub 2021 Sep 9.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Deep Learning in Protein Structural Modeling and Design.蛋白质结构建模与设计中的深度学习

Patterns (N Y). 2020 Nov 12;1(9):100142. doi: 10.1016/j.patter.2020.100142. eCollection 2020 Dec 11.

Multitask Non-Autoregressive Model for Human Motion Prediction.多任务非自回归人体运动预测模型。

IEEE Trans Image Process. 2021;30:2562-2574. doi: 10.1109/TIP.2020.3038362. Epub 2021 Feb 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ProRefiner：一种基于信息熵的全局图注意力逆蛋白折叠细化策略。

ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献