Bedbrook Claire N, Yang Kevin K, Rice Austin J, Gradinaru Viviana, Arnold Frances H
Division of Biology and Biological Engineering; California Institute of Technology; Pasadena, California; United States of America.
Division of Chemistry and Chemical Engineering; California Institute of Technology; Pasadena, California; United States of America.
PLoS Comput Biol. 2017 Oct 23;13(10):e1005786. doi: 10.1371/journal.pcbi.1005786. eCollection 2017 Oct.
There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well.
对研究和设计整合膜蛋白(MPs)的兴趣与日俱增,这些蛋白在感知和调节细胞对各种外部信号的反应中起着关键作用。一个MP必须在脂质双层中表达、正确插入并折叠,然后运输到适当的细胞位置才能发挥功能。这些过程的序列和结构决定因素很复杂且受到高度限制。在这里,我们描述了一种预测性的机器学习方法,该方法捕捉了这种复杂性,以促进成功的MP工程和设计。对通过结构引导的SCHEMA重组产生的精心选择的训练序列进行机器学习,使我们能够准确预测多种通道视紫红质(ChRs)文库中那些表达并定位到哺乳动物细胞质膜的罕见序列。这些源自微生物的光门控通道蛋白在神经科学应用中备受关注,在这些应用中,质膜表达和定位是其发挥功能的前提条件。我们使用来自通过三个亲本ChR的SCHEMA重组设计的118,098变体文库中选出的218个ChR嵌合体的表达和定位数据,训练了高斯过程(GP)分类和回归模型。我们使用这些GP模型来识别表达和定位良好的ChRs,并表明我们的模型可以阐明对这些过程重要的序列和结构元件。我们还使用预测模型将一种天然存在但无法在哺乳动物中定位的ChR转化为一种定位良好的ChR。