Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA.
Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA.
Nat Comput Sci. 2024 Mar;4(3):224-236. doi: 10.1038/s43588-024-00611-w. Epub 2024 Mar 21.
Here we used machine learning to engineer genetically encoded fluorescent indicators, protein-based sensors critical for real-time monitoring of biological activity. We used machine learning to predict the outcomes of sensor mutagenesis by analyzing established libraries that link sensor sequences to functions. Using the GCaMP calcium indicator as a scaffold, we developed an ensemble of three regression models trained on experimentally derived GCaMP mutation libraries. The trained ensemble performed an in silico functional screen on 1,423 novel, uncharacterized GCaMP variants. As a result, we identified the ensemble-derived GCaMP (eGCaMP) variants, eGCaMP and eGCaMP, which achieve both faster kinetics and larger ∆F/F responses upon stimulation than previously published fast variants. Furthermore, we identified a combinatorial mutation with extraordinary dynamic range, eGCaMP, which outperforms the tested sixth-, seventh- and eighth-generation GCaMPs. These findings demonstrate the value of machine learning as a tool to facilitate the efficient engineering of proteins for desired biophysical characteristics.
在这里,我们使用机器学习来设计基因编码的荧光指示剂,这是实时监测生物活性的关键蛋白质传感器。我们使用机器学习通过分析将传感器序列与功能联系起来的已建立的文库来预测传感器诱变的结果。我们以钙指示剂 GCaMP 为支架,开发了一组基于实验衍生的 GCaMP 突变文库训练的三个回归模型。经过训练的模型在 1423 个新的、未表征的 GCaMP 变体上进行了计算机功能筛选。结果,我们鉴定出了从集合中派生的 GCaMP(eGCaMP)变体 eGCaMP 和 eGCaMP,它们在刺激下的动力学和 ∆F/F 反应均比以前报道的快速变体更快。此外,我们还鉴定出了一种具有非凡动态范围的组合突变体 eGCaMP,它优于测试的第六、第七和第八代 GCaMP。这些发现证明了机器学习作为一种工具的价值,可用于有效地工程设计具有所需生物物理特性的蛋白质。