Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, United States.
Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, United States; Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, United States.
Metab Eng. 2017 Nov;44:171-181. doi: 10.1016/j.ymben.2017.09.016. Epub 2017 Oct 10.
Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. To demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of ~80% using ~33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. The application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.
酶的底物宽泛性比之前认为的更为普遍,这对理解代谢及其在生物催化中的应用有重大影响。这一认识引发了对酶宽泛性进行有效描述的需求。目前,酶宽泛性的特征是使用数量有限的人为选择的化合物,这些化合物可能不能代表酶的多功能性。虽然测试大量的化合物可能不切实际,但计算方法可以利用现有数据来确定下一个最具信息性的测试底物,从而更彻底地探索酶的多功能性。为了证明这一点,我们使用了现有的研究数据,并对四种不同的酶进行了化合物测试,使用这些数据集开发了支持向量机 (SVM) 模型,并使用主动学习方法选择了用于实验的其他化合物。发现使用化学多样性化合物集训练的 SVM 可以实现高达约 80%的准确率,而使用的化合物数量比现有研究中测试的所有化合物数据集少约 33%。使用主动学习选择用于测试的化合物解决了现有训练数据中的明显冲突,同时为数据集增加了多样性。将这些算法应用于广泛的代谢酶将产生一系列 SVM,可以预测高概率的酶宽泛性反应,这可能成为设计新型代谢途径的宝贵资源。