• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SPDesign:基于结构序列轮廓的蛋白质序列设计,使用超快形状识别。

SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae146.

DOI:10.1093/bib/bbae146
PMID:38600663
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11006797/
Abstract

Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.

摘要

蛋白质序列设计可以为生物制药和疾病治疗提供有价值的见解。目前,大多数基于深度学习的蛋白质序列设计方法都侧重于网络架构优化,而忽略了蛋白质特有的物理化学特征。受结构模板和预训练模型在蛋白质结构预测中的成功应用的启发,我们探索了结构序列特征的表示是否可以用于蛋白质序列设计。在这项工作中,我们提出了基于结构序列特征的蛋白质序列设计方法 SPDesign,该方法使用超快形状识别。给定输入的骨干结构,SPDesign 使用超快形状识别向量来加速在内部 PAcluster80 结构数据库中搜索相似的蛋白质结构,然后通过结构比对提取序列特征。结合结构预训练知识和几何特征,将它们进一步输入到增强图神经网络中进行序列预测。结果表明,SPDesign 显著优于最先进的方法,如 ProteinMPNN、Pifold 和 LM-Design,在 CATH 4.2 基准测试中分别使序列恢复率提高了 21.89%、15.54%和 11.4%。在具有较少同源序列的孤儿和从头开始(设计)基准测试中也取得了令人鼓舞的结果。此外,通过 PDBench 工具进行的分析表明,SPDesign 在细分结构中表现良好。更有趣的是,我们发现 SPDesign 可以很好地重建具有相似结构但不同序列的一些蛋白质的序列。最后,结构建模验证实验表明,SPDesign 设计的序列可以更准确地折叠成天然结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/c47bdf3eddc6/bbae146f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/a0537198db80/bbae146f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/d6fe49b188a9/bbae146f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/a8bd0d42a639/bbae146f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/73b7132cfeae/bbae146f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/c47bdf3eddc6/bbae146f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/a0537198db80/bbae146f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/d6fe49b188a9/bbae146f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/a8bd0d42a639/bbae146f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/73b7132cfeae/bbae146f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e30/11006797/c47bdf3eddc6/bbae146f5.jpg

相似文献

1
SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition.SPDesign:基于结构序列轮廓的蛋白质序列设计,使用超快形状识别。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae146.
2
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.SCPRED:对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。
BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.
3
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
4
Detecting distant-homology protein structures by aligning deep neural-network based contact maps.通过对齐基于深度神经网络的接触图来检测远程同源蛋白结构。
PLoS Comput Biol. 2019 Oct 17;15(10):e1007411. doi: 10.1371/journal.pcbi.1007411. eCollection 2019 Oct.
5
GraphGPSM: a global scoring model for protein structure using graph neural networks.GraphGPSM:一种使用图神经网络的蛋白质结构全局评分模型。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad219.
6
Fast and Flexible Protein Design Using Deep Graph Neural Networks.利用深度图神经网络实现快速灵活的蛋白质设计。
Cell Syst. 2020 Oct 21;11(4):402-411.e4. doi: 10.1016/j.cels.2020.08.016. Epub 2020 Sep 23.
7
An analysis of protein language model embeddings for fold prediction.蛋白质语言模型嵌入物折叠预测分析。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac142.
8
Comparative modeling without implicit sequence alignments.无隐式序列比对的比较建模。
Bioinformatics. 2007 Oct 1;23(19):2522-7. doi: 10.1093/bioinformatics/btm380. Epub 2007 Jul 27.
9
DescFold: a web server for protein fold recognition.DescFold:用于蛋白质折叠识别的网络服务器。
BMC Bioinformatics. 2009 Dec 14;10:416. doi: 10.1186/1471-2105-10-416.
10
Capturing protein sequence-structure specificity using computational sequence design.利用计算序列设计捕获蛋白质序列-结构特异性。
Proteins. 2013 Sep;81(9):1556-70. doi: 10.1002/prot.24307. Epub 2013 Jun 20.

引用本文的文献

1
$\mathcal{S}$ able: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm.$\mathcal{S}$ able:通过一种强大且通用的预训练范式弥合蛋白质结构理解方面的差距。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf120.
2
How the technologies behind self-driving cars, social networks, ChatGPT, and DALL-E2 are changing structural biology.自动驾驶汽车、社交网络、ChatGPT和DALL-E2背后的技术如何正在改变结构生物学。
Bioessays. 2025 Jan;47(1):e2400155. doi: 10.1002/bies.202400155. Epub 2024 Oct 15.

本文引用的文献

1
SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network.SPIN-CGNN:基于接触图的图构建和接触图神经网络改进固定骨架蛋白设计。
PLoS Comput Biol. 2023 Dec 7;19(12):e1011330. doi: 10.1371/journal.pcbi.1011330. eCollection 2023 Dec.
2
A Multimodal Deep Learning Framework for Predicting PPI-Modulator Interactions.一种用于预测蛋白质-蛋白质相互作用调节剂相互作用的多模态深度学习框架。
J Chem Inf Model. 2023 Dec 11;63(23):7363-7372. doi: 10.1021/acs.jcim.3c01527. Epub 2023 Dec 1.
3
DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.
DeepProSite:使用 ESMFold 和预训练语言模型进行结构感知的蛋白质结合位点预测。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.
4
Accurate and efficient protein sequence design through learning concise local environment of residues.通过学习残基简洁的局部环境来实现准确高效的蛋白质序列设计。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad122.
5
Protein structure and folding pathway prediction based on remote homologs recognition using PAthreader.基于远程同源物识别的 PAthreader 进行蛋白质结构和折叠途径预测。
Commun Biol. 2023 Mar 4;6(1):243. doi: 10.1038/s42003-023-04605-8.
6
PDBench: evaluating computational methods for protein-sequence design.PDBench:评估蛋白质序列设计的计算方法。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad027.
7
Single-sequence protein structure prediction using a language model and deep learning.基于语言模型和深度学习的单序列蛋白质结构预测。
Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.
8
Robust deep learning-based protein sequence design using ProteinMPNN.使用 ProteinMPNN 进行健壮的基于深度学习的蛋白质序列设计。
Science. 2022 Oct 7;378(6615):49-56. doi: 10.1126/science.add2187. Epub 2022 Sep 15.
9
Protein design via deep learning.通过深度学习进行蛋白质设计。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac102.
10
DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning.DeepUMQA:基于超快形状识别的深度学习蛋白质模型质量评估。
Bioinformatics. 2022 Mar 28;38(7):1895-1903. doi: 10.1093/bioinformatics/btac056.