King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia.
Institute of Parasitology, McGill University, Montreal, Quebec, Canada.
Bioinformatics. 2019 Aug 1;35(15):2634-2643. doi: 10.1093/bioinformatics/bty1035.
Accurate and wide-ranging prediction of thermodynamic parameters for biochemical reactions can facilitate deeper insights into the workings and the design of metabolic systems.
Here, we introduce a machine learning method with chemical fingerprint-based features for the prediction of the Gibbs free energy of biochemical reactions. From a large pool of 2D fingerprint-based features, this method systematically selects a small number of relevant ones and uses them to construct a regularized linear model. Since a manual selection of 2D structure-based features can be a tedious and time-consuming task, requiring expert knowledge about the structure-activity relationship of chemical compounds, the systematic feature selection step in our method offers a convenient means to identify relevant 2D fingerprint-based features. By comparing our method with state-of-the-art linear regression-based methods for the standard Gibbs free energy prediction, we demonstrated that its prediction accuracy and prediction coverage are most favorable. Our results show direct evidence that a number of 2D fingerprints collectively provide useful information about the Gibbs free energy of biochemical reactions and that our systematic feature selection procedure provides a convenient way to identify them.
Our software is freely available for download at http://sfb.kaust.edu.sa/Pages/Software.aspx.
Supplementary data are available at Bioinformatics online.
准确且广泛地预测生化反应的热力学参数可以帮助我们更深入地了解代谢系统的工作原理和设计。
在这里,我们引入了一种基于化学指纹特征的机器学习方法,用于预测生化反应的吉布斯自由能。该方法从大量的 2D 指纹特征中系统地选择少数相关特征,并使用它们构建正则化线性模型。由于手动选择 2D 结构特征可能是一项繁琐且耗时的任务,需要具备关于化合物结构-活性关系的专业知识,因此我们方法中的系统特征选择步骤为识别相关的 2D 指纹特征提供了一种便捷的手段。通过将我们的方法与基于最新线性回归的标准吉布斯自由能预测方法进行比较,我们证明了其预测准确性和预测覆盖率是最有利的。我们的结果直接证明了一些 2D 指纹特征共同提供了关于生化反应吉布斯自由能的有用信息,并且我们的系统特征选择过程为识别这些特征提供了一种便捷的方法。
我们的软件可在 http://sfb.kaust.edu.sa/Pages/Software.aspx 免费下载。
补充数据可在《生物信息学》在线获取。