Seki Kosuke, Guo Amy B, Akpinaroglu Deniz, Kortemme Tanja
bioRxiv. 2025 Aug 3:2025.08.03.668353. doi: 10.1101/2025.08.03.668353.
Mapping protein sequence-function landscapes has either been limited to small steps (only few mutations) or to sequences similar to those already explored by evolution to maintain activity. Here, we overcome both limitations by applying deep-learning guided redesign to a natural protein tyrosine kinase to generate novel, functional sequences with highly combinatorial mutations. Using cell-free assays, we measure the activities and concentrations of 537 redesigned sequences, which differ from the wild-type by an average of 37 mutations while retaining activity in 85% of variants. These sequences sample 436 unique mutations at 76 different positions throughout the kinase domain. A simple regression model identifies key sequence determinants of function and predicts the function of unseen sequences. Our approach demonstrates how integrating deep-learning guided redesign, functional measurement at scale, and interpretable computational modelling enables functional exploration of highly combinatorial and sparse sequence-function landscapes at mutational scales not possible before.
绘制蛋白质序列-功能图谱要么局限于小步骤(仅少数突变),要么局限于与进化中已探索过的、能维持活性的序列相似的序列。在此,我们通过对一种天然蛋白酪氨酸激酶应用深度学习引导的重新设计来克服这两个局限性,以生成具有高度组合突变的新型功能序列。使用无细胞测定法,我们测量了537个重新设计序列的活性和浓度,这些序列与野生型平均相差37个突变,同时85%的变体保留了活性。这些序列在整个激酶结构域的76个不同位置对43个独特突变进行了采样。一个简单的回归模型确定了功能的关键序列决定因素,并预测了未见过的序列的功能。我们的方法展示了如何将深度学习引导的重新设计、大规模功能测量和可解释的计算建模相结合,从而在以前不可能达到的突变尺度上对高度组合和稀疏的序列-功能图谱进行功能探索。