Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany.
Beijing National Laboratory for Molecular Sciences, CAS Key Laboratory of Molecular Recognition and Function, Institute of Chemistry, Chinese Academy of Sciences, Zhongguancun North First Street 2, Beijing, 100190, China.
Angew Chem Int Ed Engl. 2023 Jun 5;62(23):e202301660. doi: 10.1002/anie.202301660. Epub 2023 May 3.
Amine transaminases (ATAs) are powerful biocatalysts for the stereoselective synthesis of chiral amines. Machine learning provides a promising approach for protein engineering, but activity prediction models for ATAs remain elusive due to the difficulty of obtaining high-quality training data. Thus, we first created variants of the ATA from Ruegeria sp. (3FCR) with improved catalytic activity (up to 2000-fold) as well as reversed stereoselectivity by a structure-dependent rational design and collected a high-quality dataset in this process. Subsequently, we designed a modified one-hot code to describe steric and electronic effects of substrates and residues within ATAs. Finally, we built a gradient boosting regression tree predictor for catalytic activity and stereoselectivity, and applied this for the data-driven design of optimized variants which then showed improved activity (up to 3-fold compared to the best variants previously identified). We also demonstrated that the model can predict the catalytic activity for ATA variants of another origin by retraining with a small set of additional data.
胺转氨酶(ATAs)是手性胺立体选择性合成的有力生物催化剂。机器学习为蛋白质工程提供了一种有前途的方法,但由于难以获得高质量的训练数据,ATA 的活性预测模型仍然难以捉摸。因此,我们首先通过结构相关的合理设计创建了 Ruegeria sp.(3FCR)的 ATA 变体,其催化活性提高(高达 2000 倍),并且立体选择性反转,并在此过程中收集了高质量的数据集。随后,我们设计了一种改进的独热码来描述 ATAs 中底物和残基的立体和电子效应。最后,我们构建了一个梯度提升回归树预测器来预测催化活性和立体选择性,并将其应用于优化变体的数据驱动设计,这些变体的活性得到了提高(与之前确定的最佳变体相比,提高了 3 倍)。我们还通过使用一小部分额外数据进行重新训练,证明了该模型可以对手性胺立体选择性合成的模型进行预测。