Suppr超能文献

利用来自多个来源的深度突变扫描数据学习蛋白质适应度景观。

Learning protein fitness landscapes with deep mutational scanning data from multiple sources.

机构信息

Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China.

出版信息

Cell Syst. 2023 Aug 16;14(8):706-721.e5. doi: 10.1016/j.cels.2023.07.003.

Abstract

One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.

摘要

机器学习辅助定向进化(MLDE)的关键点之一是对适应度景观的准确学习,适应度景观是一种从序列变体到所需功能的概念映射。在这里,我们描述了一种多蛋白训练方案,该方案利用来自不同蛋白质的现有深度突变扫描数据来帮助理解新蛋白质的适应度景观。设计了概念验证试验,从三个方面验证了该训练方案:单变体效应的随机和位置外推、新蛋白质的零-shot 适应度预测以及从单变体效应推断更高阶变体效应。此外,我们的研究确定了以前被忽视的强基线,它们出人意料的良好表现引起了我们对 MLDE 陷阱的关注。总体而言,这些结果可能有助于我们更好地理解不同蛋白质适应度谱之间的关联,并为开发更好的机器学习辅助蛋白质定向进化方法提供启示。本论文的透明同行评审过程记录包含在补充信息中。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验