基于机器学习集成导向的基因编码荧光钙指示剂工程

Machine Learning Ensemble Directed Engineering of Genetically Encoded Fluorescent Calcium Indicators.

作者信息

Wait Sarah J, Rappleye Michael, Lee Justin Daho, Goy Marc Exposit, Smith Netta, Berndt Andre

机构信息

Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA.

Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA.

出版信息

Res Sq. 2023 Aug 7:rs.3.rs-3146778. doi: 10.21203/rs.3.rs-3146778/v1.

DOI:10.21203/rs.3.rs-3146778/v1

PMID:37609342

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10441480/

Abstract

In this study, we focused on the transformative potential of machine learning in the engineering of genetically encoded fluorescent indicators (GEFIs), protein-based sensing tools that are critical for real-time monitoring of biological activity. GEFIs are complex proteins with multiple dynamic states, rendering optimization by trial-and-error mutagenesis a challenging problem. We applied an alternative approach using machine learning to predict the outcomes of sensor mutagenesis by analyzing established libraries that link sensor sequences to functions. Using the GCaMP calcium indicator as a scaffold, we developed an ensemble of three regression models trained on experimentally derived GCaMP mutation libraries. We used the trained ensemble to perform an in silico functional screen on 1423 novel, uncharacterized GCaMP variants. As a result, we identified the novel ensemble-derived GCaMP (eGCaMP) variants, eGCaMP and eGCaMP+, that achieve both faster kinetics and larger fluorescent responses upon stimulation than previously published fast variants. Furthermore, we identified a combinatorial mutation with extraordinary dynamic range, eGCaMP2+, that outperforms the tested 6th, 7th, and 8th generation GCaMPs. These findings demonstrate the value of machine learning as a tool to facilitate the efficient pre-screening of mutants for functional characteristics. By leveraging the learning capabilities of our ensemble, we were able to accelerate the identification of promising mutations and reduce the experimental burden associated with trial-and-error mutagenesis. Overall, these findings have significant implications for optimizing GEFIs and other protein-based tools, demonstrating the utility of machine learning as a powerful asset in protein engineering.

摘要

在本研究中，我们聚焦于机器学习在基因编码荧光指示剂（GEFIs）工程中的变革潜力，GEFIs是基于蛋白质的传感工具，对生物活性的实时监测至关重要。GEFIs是具有多种动态状态的复杂蛋白质，通过反复试验诱变进行优化是一个具有挑战性的问题。我们采用了一种替代方法，利用机器学习通过分析将传感器序列与功能联系起来的已建立文库来预测传感器诱变的结果。以GCaMP钙指示剂为支架，我们开发了一组基于实验得出的GCaMP突变文库训练的三个回归模型。我们使用训练好的模型对1423个新的、未表征的GCaMP变体进行了虚拟功能筛选。结果，我们鉴定出了新的基于模型得出的GCaMP（eGCaMP）变体，即eGCaMP和eGCaMP+，它们在受到刺激时比之前发表的快速变体具有更快的动力学和更大的荧光响应。此外，我们还鉴定出了具有非凡动态范围的组合突变eGCaMP2+，其性能优于测试的第六代、第七代和第八代GCaMP。这些发现证明了机器学习作为一种工具在促进对突变体功能特性进行高效预筛选方面的价值。通过利用我们模型的学习能力，我们能够加速识别有前景的突变，并减少与反复试验诱变相关的实验负担。总体而言，这些发现对优化GEFIs和其他基于蛋白质的工具具有重要意义，证明了机器学习作为蛋白质工程中一种强大资产的实用性。