Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States.
Department of Biostatistics, Vanderbilt University, Nashville, Tennessee 37205, United States.
J Chem Inf Model. 2022 Nov 28;62(22):5841-5848. doi: 10.1021/acs.jcim.2c01139. Epub 2022 Oct 26.
Data-driven modeling has emerged as a new paradigm for biocatalyst design and discovery. Biocatalytic databases that integrate enzyme structure and function data are in urgent need. Here we describe IntEnzyDB as an integrated structure-kinetics database for facile statistical modeling and machine learning. IntEnzyDB employs a relational database architecture with a flattened data structure, which allows rapid data operation. This architecture also makes it easy for IntEnzyDB to incorporate more types of enzyme function data. IntEnzyDB contains enzyme kinetics and structure data from six enzyme commission classes. Using 1050 enzyme structure-kinetics pairs, we investigated the efficiency-perturbing propensities of mutations that are close or distal to the active site. The statistical results show that efficiency-enhancing mutations are globally encoded and that deleterious mutations are much more likely to occur in close mutations than in distal mutations. Finally, we describe a web interface that allows public users to access enzymology data stored in IntEnzyDB. IntEnzyDB will provide a computational facility for data-driven modeling in biocatalysis and molecular evolution.
数据驱动的建模方法已经成为生物催化剂设计和发现的新范例。急需整合酶结构和功能数据的生物催化数据库。在这里,我们将描述 IntEnzyDB 作为一个用于简化统计建模和机器学习的综合结构动力学数据库。IntEnzyDB 采用关系数据库架构和扁平化数据结构,允许快速进行数据操作。这种架构还使 IntEnzyDB 很容易整合更多类型的酶功能数据。IntEnzyDB 包含来自六个酶委员会类别的酶动力学和结构数据。使用 1050 个酶结构-动力学对,我们研究了靠近或远离活性位点的突变对效率的干扰倾向。统计结果表明,效率增强的突变是全局编码的,并且有害突变更可能发生在近距离突变中,而不是在远距离突变中。最后,我们描述了一个 Web 界面,允许公共用户访问存储在 IntEnzyDB 中的酶学数据。IntEnzyDB 将为生物催化和分子进化中的数据驱动建模提供计算工具。