• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过学习残基简洁的局部环境来实现准确高效的蛋白质序列设计。

Accurate and efficient protein sequence design through learning concise local environment of residues.

机构信息

Key Lab of Intelligent Information Processing, SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.

University of Chinese Academy of Sciences, Beijing 100110, China.

出版信息

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad122.

DOI:10.1093/bioinformatics/btad122
PMID:36916746
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10027430/
Abstract

MOTIVATION

Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.

RESULTS

Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue's local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein.

AVAILABILITY AND IMPLEMENTATION

The source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE.

摘要

动机

计算蛋白质序列设计已广泛应用于理性蛋白质工程,提高设计精度和效率是非常需要的。

结果

在这里,我们提出了 ProDESIGN-LE,这是一种准确高效的蛋白质序列设计方法。ProDESIGN-LE 采用简洁但信息量丰富的残基局部环境表示,并训练一个转换器来学习残基局部环境与其氨基酸类型之间的相关性。对于目标骨架结构,ProDESIGN-LE 使用转换器根据其在该结构中的局部环境为每个位置分配适当的残基类型,最终获得所有残基都与其局部环境良好匹配的设计序列。我们应用 ProDESIGN-LE 设计了 68 个天然存在的和 129 个虚幻的蛋白质的序列,平均每个蛋白质的设计时间为 20 秒。设计的蛋白质具有与目标结构完美匹配的预测结构,具有最先进的平均 TM 评分超过 0.80。我们通过为一种酶,氯霉素 O-乙酰基转移酶 III(CAT III)设计五个序列,并在大肠杆菌中重组表达这些蛋白质,进一步实验验证了 ProDESIGN-LE。这些蛋白质中,有三个表现出极好的可溶性,一个产生了与天然 CAT III 蛋白一致的圆二色光谱的单体物种。

可用性和实现

ProDESIGN-LE 的源代码可在 https://github.com/bigict/ProDESIGN-LE 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/fe58b2db7ced/btad122f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/14f22bf218ad/btad122f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/06eee0a3b38b/btad122f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/0dbd522b5331/btad122f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/fe58b2db7ced/btad122f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/14f22bf218ad/btad122f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/06eee0a3b38b/btad122f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/0dbd522b5331/btad122f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/fe58b2db7ced/btad122f4.jpg

相似文献

1
Accurate and efficient protein sequence design through learning concise local environment of residues.通过学习残基简洁的局部环境来实现准确高效的蛋白质序列设计。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad122.
2
A fast and flexible approach to oligonucleotide probe design for genomes and gene families.一种针对基因组和基因家族的快速且灵活的寡核苷酸探针设计方法。
Bioinformatics. 2007 May 15;23(10):1195-202. doi: 10.1093/bioinformatics/btm114. Epub 2007 Mar 28.
3
Deep learning of protein sequence design of protein-protein interactions.深度学习蛋白质序列设计蛋白质-蛋白质相互作用。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac733.
4
A structural homology approach for computational protein design with flexible backbone.一种具有柔性骨架的计算蛋白质设计的结构同源性方法。
Bioinformatics. 2019 Jul 15;35(14):2418-2426. doi: 10.1093/bioinformatics/bty975.
5
A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins.计算蛋白质设计的大规模测试:九种完全重新设计的球状蛋白质的折叠与稳定性
J Mol Biol. 2003 Sep 12;332(2):449-60. doi: 10.1016/s0022-2836(03)00888-x.
6
LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model.LMCrot:一种基于转换器的蛋白质语言模型的可解释窗口级嵌入的增强型蛋白质巴豆酰化位点预测器。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae290.
7
SASA-Net: A Spatial-Aware Self-Attention Mechanism for Building Protein 3D Structure Directly From Inter- Residue Distances.SASA-Net:一种基于残差距离直接构建蛋白质 3D 结构的空间感知自注意力机制。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3482-3488. doi: 10.1109/TCBB.2023.3240456. Epub 2023 Dec 25.
8
De novo protein design by deep network hallucination.基于深度网络幻觉的从头设计蛋白质。
Nature. 2021 Dec;600(7889):547-552. doi: 10.1038/s41586-021-04184-w. Epub 2021 Dec 1.
9
Positive multistate protein design.正向多态蛋白质设计。
Bioinformatics. 2020 Jan 1;36(1):122-130. doi: 10.1093/bioinformatics/btz497.
10
PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.PaRSnIP:基于梯度提升机的序列基蛋白质溶解性预测。
Bioinformatics. 2018 Apr 1;34(7):1092-1098. doi: 10.1093/bioinformatics/btx662.

引用本文的文献

1
ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction.ScFold:一种基于图神经网络的模型,用于通过空间约简实现短链蛋白质的高效反向折叠。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf156.
2
Impact of Radiometal Chelates on In Vivo Visualization of Immune Checkpoint Protein Using Radiolabeled Affibody Molecules.放射性金属螯合物对使用放射性标记亲合素分子的免疫检查点蛋白体内可视化的影响。
ACS Pharmacol Transl Sci. 2025 Feb 19;8(3):706-717. doi: 10.1021/acsptsci.4c00539. eCollection 2025 Mar 14.
3
Enhancing Functional Protein Design Using Heuristic Optimization and Deep Learning for Anti-Inflammatory and Gene Therapy Applications.

本文引用的文献

1
Rotamer-free protein sequence design based on deep learning and self-consistency.基于深度学习和自一致性的无旋转异构体蛋白质序列设计
Nat Comput Sci. 2022 Jul;2(7):451-462. doi: 10.1038/s43588-022-00273-6. Epub 2022 Jul 21.
2
SASA-Net: A Spatial-Aware Self-Attention Mechanism for Building Protein 3D Structure Directly From Inter- Residue Distances.SASA-Net:一种基于残差距离直接构建蛋白质 3D 结构的空间感知自注意力机制。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3482-3488. doi: 10.1109/TCBB.2023.3240456. Epub 2023 Dec 25.
3
Protein sequence design with a learned potential.
利用启发式优化和深度学习增强功能性蛋白质设计以用于抗炎和基因治疗应用
Proteins. 2025 Jul;93(7):1238-1256. doi: 10.1002/prot.26810. Epub 2025 Feb 22.
4
ProBID-Net: a deep learning model for protein-protein binding interface design.ProBID-Net:一种用于蛋白质-蛋白质结合界面设计的深度学习模型。
Chem Sci. 2024 Oct 30;15(47):19977-19990. doi: 10.1039/d4sc02233e. eCollection 2024 Dec 4.
5
SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition.SPDesign:基于结构序列轮廓的蛋白质序列设计,使用超快形状识别。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae146.
6
Graphormer supervised de novo protein design method and function validation.Graphormer 监督从头蛋白质设计方法和功能验证。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae135.
7
Revealing protein sequence organization via contiguous hydrophobicity with the blobulator toolkit.使用Blobulator工具包通过连续疏水性揭示蛋白质序列组织。
bioRxiv. 2025 Mar 18:2024.01.15.575761. doi: 10.1101/2024.01.15.575761.
8
Multi-indicator comparative evaluation for deep learning-based protein sequence design methods.基于深度学习的蛋白质序列设计方法的多指标比较评估。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae037.
9
SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network.SPIN-CGNN:基于接触图的图构建和接触图神经网络改进固定骨架蛋白设计。
PLoS Comput Biol. 2023 Dec 7;19(12):e1011330. doi: 10.1371/journal.pcbi.1011330. eCollection 2023 Dec.
10
Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms.蛋白质结构预测:挑战、进展与研究范式的转变
Genomics Proteomics Bioinformatics. 2023 Oct;21(5):913-925. doi: 10.1016/j.gpb.2022.11.014. Epub 2023 Mar 30.
利用学习到的势能进行蛋白质序列设计。
Nat Commun. 2022 Feb 8;13(1):746. doi: 10.1038/s41467-022-28313-9.
4
De novo protein design by deep network hallucination.基于深度网络幻觉的从头设计蛋白质。
Nature. 2021 Dec;600(7889):547-552. doi: 10.1038/s41586-021-04184-w. Epub 2021 Dec 1.
5
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
6
Tools and methods for circular dichroism spectroscopy of proteins: a tutorial review.蛋白质圆二色性光谱学的工具和方法:教程综述。
Chem Soc Rev. 2021 Aug 7;50(15):8400-8413. doi: 10.1039/d0cs00558d. Epub 2021 Jun 15.
7
CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction.CopulaNet:直接从多序列比对中学习残基协同进化用于蛋白质结构预测。
Nat Commun. 2021 May 5;12(1):2535. doi: 10.1038/s41467-021-22869-8.
8
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
9
A strategy for proline and glycine mutations to proteins with alchemical free energy calculations.利用化学自由能计算对蛋白质进行脯氨酸和甘氨酸突变的策略。
J Comput Chem. 2021 Jun 5;42(15):1088-1094. doi: 10.1002/jcc.26525. Epub 2021 Apr 12.
10
Fast and Flexible Protein Design Using Deep Graph Neural Networks.利用深度图神经网络实现快速灵活的蛋白质设计。
Cell Syst. 2020 Oct 21;11(4):402-411.e4. doi: 10.1016/j.cels.2020.08.016. Epub 2020 Sep 23.