• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 Transformer 的变体效应预测的多域蛋白的 Evotuning 协议。

Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins.

机构信息

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan.

Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab234.

DOI:10.1093/bib/bbab234
PMID:34180966
Abstract

Accurate variant effect prediction has broad impacts on protein engineering. Recent machine learning approaches toward this end are based on representation learning, by which feature vectors are learned and generated from unlabeled sequences. However, it is unclear how to effectively learn evolutionary properties of an engineering target protein from homologous sequences, taking into account the protein's sequence-level structure called domain architecture (DA). Additionally, no optimal protocols are established for incorporating such properties into Transformer, the neural network well-known to perform the best in natural language processing research. This article proposes DA-aware evolutionary fine-tuning, or 'evotuning', protocols for Transformer-based variant effect prediction, considering various combinations of homology search, fine-tuning and sequence vectorization strategies. We exhaustively evaluated our protocols on diverse proteins with different functions and DAs. The results indicated that our protocols achieved significantly better performances than previous DA-unaware ones. The visualizations of attention maps suggested that the structural information was incorporated by evotuning without direct supervision, possibly leading to better prediction accuracy.

摘要

准确的变异效应预测对蛋白质工程有广泛的影响。最近针对这一目标的机器学习方法基于表示学习,通过这种方法可以从无标签序列中学习和生成特征向量。然而,目前尚不清楚如何从同源序列中有效地学习工程目标蛋白质的进化特性,同时考虑到蛋白质的序列级结构,即结构域架构(DA)。此外,还没有为将这些特性纳入到在自然语言处理研究中表现最好的神经网络——Transformer 中建立最佳协议。本文提出了基于 Transformer 的变异效应预测的 DA 感知进化微调,或 'evotuning' 协议,考虑了同源搜索、微调以及序列向量化策略的各种组合。我们在具有不同功能和 DA 的各种蛋白质上进行了详尽的评估。结果表明,与以前的不考虑 DA 的协议相比,我们的协议实现了显著更好的性能。注意力图的可视化表明,结构信息通过 evotuning 进行了整合,而无需直接监督,这可能导致更好的预测准确性。

相似文献

1
Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins.基于 Transformer 的变体效应预测的多域蛋白的 Evotuning 协议。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab234.
2
MFTrans: A multi-feature transformer network for protein secondary structure prediction.MFTrans:一种用于蛋白质二级结构预测的多特征变换网络。
Int J Biol Macromol. 2024 May;267(Pt 1):131311. doi: 10.1016/j.ijbiomac.2024.131311. Epub 2024 Apr 9.
3
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.
4
Multimodal deep representation learning for protein interaction identification and protein family classification.基于多模态深度表示学习的蛋白质相互作用识别和蛋白质家族分类。
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):531. doi: 10.1186/s12859-019-3084-y.
5
An analysis of protein language model embeddings for fold prediction.蛋白质语言模型嵌入物折叠预测分析。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac142.
6
Structure-Aware Multimodal Deep Learning for Drug-Protein Interaction Prediction.用于药物-蛋白质相互作用预测的结构感知多模态深度学习
J Chem Inf Model. 2022 Mar 14;62(5):1308-1317. doi: 10.1021/acs.jcim.2c00060. Epub 2022 Feb 24.
7
End-to-End Protein Normal Mode Frequency Predictions Using Language and Graph Models and Application to Sonification.使用语言和图形模型进行端到端蛋白质正常模式频率预测及其在可听化中的应用。
ACS Nano. 2022 Dec 27;16(12):20656-20670. doi: 10.1021/acsnano.2c07681. Epub 2022 Nov 23.
8
PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning.PaleAle 5.0:通过深度学习预测蛋白质相对溶剂可及性。
Amino Acids. 2019 Sep;51(9):1289-1296. doi: 10.1007/s00726-019-02767-6. Epub 2019 Aug 6.
9
DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning.DeepHomo2.0:通过Transformer增强的深度学习改进同源二聚体的蛋白质-蛋白质接触预测
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac499.
10
ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations.ELASPIC2(EL2):结合语境化语言模型和图神经网络来预测突变的影响。
J Mol Biol. 2021 May 28;433(11):166810. doi: 10.1016/j.jmb.2021.166810. Epub 2021 Jan 13.

引用本文的文献

1
Biophysics-based protein language models for protein engineering.用于蛋白质工程的基于生物物理学的蛋白质语言模型。
Nat Methods. 2025 Sep 11. doi: 10.1038/s41592-025-02776-2.
2
Language Modelling Techniques for Analysing the Impact of Human Genetic Variation.用于分析人类基因变异影响的语言建模技术
Bioinform Biol Insights. 2025 Sep 2;19:11779322251358314. doi: 10.1177/11779322251358314. eCollection 2025.
3
PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.
PHIStruct:使用结构感知蛋白质嵌入在低序列相似性设置下改进噬菌体-宿主相互作用预测。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf016.
4
ESM-scan-A tool to guide amino acid substitutions.ESM-scan—一种指导氨基酸替换的工具。
Protein Sci. 2024 Dec;33(12):e5221. doi: 10.1002/pro.5221.
5
Biophysics-based protein language models for protein engineering.用于蛋白质工程的基于生物物理学的蛋白质语言模型。
bioRxiv. 2025 Jan 14:2024.03.15.585128. doi: 10.1101/2024.03.15.585128.
6
Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review.基因组数据分析中的Transformer架构与注意力机制:全面综述
Biology (Basel). 2023 Jul 22;12(7):1033. doi: 10.3390/biology12071033.
7
Protein embeddings improve phage-host interaction prediction.蛋白质嵌入可提高噬菌体-宿主相互作用预测。
PLoS One. 2023 Jul 24;18(7):e0289030. doi: 10.1371/journal.pone.0289030. eCollection 2023.
8
From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry.从基因型到表型:推断与食品工业相关的微生物特性的计算方法。
FEMS Microbiol Rev. 2023 Jul 5;47(4). doi: 10.1093/femsre/fuad030.
9
Transformer-based deep learning for predicting protein properties in the life sciences.基于 Transformer 的深度学习在生命科学中预测蛋白质性质。
Elife. 2023 Jan 18;12:e82819. doi: 10.7554/eLife.82819.