Pate Stefan C, Wang Eric H, Broadbelt Linda J, Tyo Keith E J
Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA.
Center for Synthetic Biology, Northwestern University, Evanston, IL, USA.
bioRxiv. 2025 Jun 27:2025.06.22.660952. doi: 10.1101/2025.06.22.660952.
Uncharacterized functions of enzymes represent untapped opportunity to develop therapeutics, unlock the sustainable synthesis of materials, and understand the evolution of life-sustaining metabolic networks. Enzymes and reactions (i.e., non-native, promiscuous reactions), generated by protein language models and computer-aided synthesis tools, respectively, make up a large part of this opportunity. Given the technical complexity of high-throughput enzymatic activity screens, predictive models are needed that can pre-screen enzyme-reaction pairs . We present Reaction-Center Graph Neural Network, (RC-GNN) a model capable of predicting whether an enzyme, represented by an amino acid sequence, can significantly catalyze a given reaction, represented by its full set of reactants and products. We explicitly evaluated RC-GNN's generalization to queries. In the most difficult conditions tested, where difficulty is measured by the level of dissimilarity between training and test data points, the model achieves 78.0% and 94.8% accuracy when reaction and enzyme similarity were respectively controlled. The ability to successfully make predictions on enzymes and reactions distinct from those used during training make RC-GNN especially useful for both metabolic engineers and evolutionary biologists who need to reason about uncharacterized enzymatic reactions.
酶的未知功能为开发治疗方法、实现材料的可持续合成以及理解维持生命的代谢网络的进化提供了尚未开发的机会。由蛋白质语言模型和计算机辅助合成工具分别产生的酶和反应(即非天然的、混杂的反应)构成了这一机会的很大一部分。鉴于高通量酶活性筛选的技术复杂性,需要能够对酶-反应对进行预筛选的预测模型。我们提出了反应中心图神经网络(RC-GNN),这是一种能够预测由氨基酸序列表示的酶是否能够显著催化由其完整反应物和产物表示的给定反应的模型。我们明确评估了RC-GNN对查询的泛化能力。在测试的最困难条件下,即根据训练和测试数据点之间的差异程度来衡量难度时,当分别控制反应和酶的相似性时,该模型的准确率分别达到78.0%和94.8%。能够成功地对与训练中使用的酶和反应不同的酶和反应进行预测,使得RC-GNN对于需要推断未知酶促反应的代谢工程师和进化生物学家特别有用。