Suppr超能文献

基于蛋白质多层次结构特征的深度学习基因突变效应预测方法。

Protein multi-level structure feature-integrated deep learning method for mutational effect prediction.

机构信息

National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, People's Republic of China.

Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, People's Republic of China.

出版信息

Biotechnol J. 2024 Aug;19(8):e2400203. doi: 10.1002/biot.202400203.

Abstract

Through iterative rounds of mutation and selection, proteins can be engineered to enhance their desired biological functions. Nevertheless, identifying optimal mutation sites for directed evolution remains challenging due to the vastness of the protein sequence landscape and the epistatic mutational effects across residues. To address this challenge, we introduce MLSmut, a deep learning-based approach that leverages multi-level structural features of proteins. MLSmut extracts salient information from protein co-evolution, sequence semantics, and geometric features to predict the mutational effect. Extensive benchmark evaluations on 10 single-site and two multi-site deep mutation scanning datasets demonstrate that MLSmut surpasses existing methods in predicting mutational outcomes. To overcome the limited training data availability, we employ a two-stage training strategy: initial coarse-tuning on a large corpus of unlabeled protein data followed by fine-tuning on a curated dataset of 40-100 experimental measurements. This approach enables our model to achieve satisfactory performance on downstream protein prediction tasks. Importantly, our model holds the potential to predict the mutational effects of any protein sequence. Collectively, these findings suggest that our approach can substantially reduce the reliance on laborious wet lab experiments and deepen our understanding of the intricate relationships between mutations and protein function.

摘要

通过反复的突变和选择,蛋白质可以被设计来增强其所需的生物学功能。然而,由于蛋白质序列景观的广阔性和残基之间的上位突变效应,确定定向进化的最佳突变位点仍然具有挑战性。为了解决这一挑战,我们引入了 MLSmut,这是一种基于深度学习的方法,利用蛋白质的多层次结构特征。MLSmut 从蛋白质共进化、序列语义和几何特征中提取显著信息,以预测突变效应。在 10 个单点和两个多点深突变扫描数据集上进行的广泛基准评估表明,MLSmut 在预测突变结果方面优于现有方法。为了克服有限的训练数据可用性,我们采用了两阶段训练策略:首先在大量未标记的蛋白质数据上进行粗调,然后在包含 40-100 个实验测量的精选数据集上进行微调。这种方法使我们的模型能够在下游蛋白质预测任务中取得令人满意的性能。重要的是,我们的模型有可能预测任何蛋白质序列的突变效应。总的来说,这些发现表明,我们的方法可以大大减少对繁琐的湿实验室实验的依赖,并加深我们对突变和蛋白质功能之间复杂关系的理解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验