Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China.
ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China.
ACS Synth Biol. 2023 Aug 18;12(8):2403-2417. doi: 10.1021/acssynbio.3c00225. Epub 2023 Jul 24.
Knowledge about the substrate scope for a given enzyme is informative for elucidating biochemical pathways and also for expanding applications of the enzyme. However, no general methods are available to accurately predict the substrate specificity of an enzyme. Pyrrolysyl-tRNA synthetase (PylRS) is a powerful tool for incorporating various noncanonical amino acids (NCAAs) into proteins, which enabled us to probe, image, rationally engineer, and evolve protein structure and function. However, the incorporation of a new NCAA typically requires the selection of large libraries of PylRS with randomized mutations at active sites, and this process requires multiple rounds of selection for each new substrate. Therefore, a single aminoacyl-tRNA synthetase with broad substrate promiscuity is ideal to facilitate widespread applications of the genetic NCAA incorporation technique. Herein, machine learning models were developed to predict the substrate specificity of PylRS to accept novel NCAAs that could be incorporated into proteins by three PylRS mutants. The models were built from a training set of 285 unique enzyme-substrate pairs of three PylRS mutants including IFRS, BtaRS, and MFRS against 95 NCAAs. The best BaggingTree (BT) model was then used for virtually screening a NCAAs library containing 1474 phenylalanine, tyrosine, tryptophan, and alanine analogues, and 156 NCAAs were predicted to be accepted by at least one of the three PylRS mutants. Then, 27 NCAAs including 24 positive and 3 negative substrates were experimentally tested for their activities, and 20 of the 24 positive substrates showed weak or strong activity and were accepted by at least one PylRS mutant, among which 11 NCAAs were never reported to be incorporated into proteins before. Three negative substrates did not show any activity. Experimental results suggested that the BT model provides a three-class classification accuracy of 0.69 and a binary classification accuracy of 0.86. This study expanded the substrate scope of three PylRS variants and provided a framework for developing machine learning models to predict substrate specificity of other PylRS variants.
关于给定酶的底物谱的知识对于阐明生化途径很有帮助,也有助于扩展酶的应用。然而,目前还没有通用的方法可以准确预测酶的底物特异性。吡咯赖氨酸-tRNA 合成酶(PylRS)是将各种非天然氨基酸(NCAAs)掺入蛋白质的有力工具,这使我们能够探测、成像、合理设计和进化蛋白质结构和功能。然而,掺入新的 NCAA 通常需要选择具有活性位点随机突变的大量 PylRS 文库,并且这个过程需要对每个新底物进行多轮选择。因此,具有广泛底物混杂性的单个氨酰-tRNA 合成酶对于促进遗传 NCAA 掺入技术的广泛应用是理想的。在此,开发了机器学习模型来预测 PylRS 接受可通过三个 PylRS 突变体掺入蛋白质的新型 NCAA 的底物特异性。该模型是基于包括 IFRS、BtaRS 和 MFRS 在内的三个 PylRS 突变体的 285 个独特酶-底物对与 95 种 NCAA 的训练集构建的。然后,使用最佳的 BaggingTree(BT)模型对包含 1474 个苯丙氨酸、酪氨酸、色氨酸和丙氨酸类似物和 156 种 NCAA 的文库进行虚拟筛选,预测至少有一种 PylRS 突变体能接受其中 156 种 NCAA。然后,实验测试了 27 种 NCAA,其中包括 24 种阳性和 3 种阴性底物,以测试其活性,其中 24 种阳性底物中的 20 种表现出弱或强活性,并且至少有一种 PylRS 突变体能接受,其中 11 种 NCAA 以前从未报道过能掺入蛋白质中。三种阴性底物没有表现出任何活性。实验结果表明,BT 模型的三分类准确率为 0.69,二分类准确率为 0.86。这项研究扩展了三个 PylRS 变体的底物范围,并为开发预测其他 PylRS 变体底物特异性的机器学习模型提供了框架。