Muir Duncan F, Asper Garrison P R, Notin Pascal, Posner Jacob A, Marks Debora S, Keiser Michael J, Pinney Margaux M
Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA, USA.
Program in Biophysics, University of California, San Francisco, San Francisco, CA, USA.
bioRxiv. 2024 Oct 25:2024.10.23.619915. doi: 10.1101/2024.10.23.619915.
Quantitatively mapping enzyme sequence-catalysis landscapes remains a critical challenge in understanding enzyme function, evolution, and design. Here, we expand an emerging microfluidic platform to measure catalytic constants- and -for hundreds of diverse naturally occurring sequences and mutants of the model enzyme Adenylate Kinase (ADK). This enables us to dissect the sequence-catalysis landscape's topology, navigability, and mechanistic underpinnings, revealing distinct catalytic peaks organized by structural motifs. These results challenge long-standing hypotheses in enzyme adaptation, demonstrating that thermophilic enzymes are not slower than their mesophilic counterparts. Combining the rich representations of protein sequences provided by deep-learning models with our custom high-throughput kinetic data yields semi-supervised models that significantly outperform existing models at predicting catalytic parameters of naturally occurring ADK sequences. Our work demonstrates a promising strategy for dissecting sequence-catalysis landscapes across enzymatic evolution and building family-specific models capable of accurately predicting catalytic constants, opening new avenues for enzyme engineering and functional prediction.
对酶的序列-催化景观进行定量映射,仍然是理解酶功能、进化和设计的一项关键挑战。在此,我们扩展了一个新兴的微流控平台,以测量模型酶腺苷酸激酶(ADK)数百种不同的天然序列和突变体的催化常数( 和 )。这使我们能够剖析序列-催化景观的拓扑结构、可导航性和机制基础,揭示由结构基序组织的不同催化峰。这些结果挑战了酶适应性方面长期存在的假设,表明嗜热酶并不比它们的中温对应物慢。将深度学习模型提供的丰富蛋白质序列表示与我们定制的高通量动力学数据相结合,产生了半监督模型,在预测天然ADK序列的催化参数方面显著优于现有模型。我们的工作展示了一种有前景的策略,用于剖析整个酶进化过程中的序列-催化景观,并构建能够准确预测催化常数的家族特异性模型,为酶工程和功能预测开辟了新途径。