Orhobor Oghenejokpeme I, Rehim Abbi Abdel, Lou Hang, Ni Hao, King Ross D
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.
Department of Mathematics, University College London, London, UK.
R Soc Open Sci. 2022 May 4;9(5):211745. doi: 10.1098/rsos.211745. eCollection 2022 May.
The representation of the protein-ligand complexes used in building machine learning models play an important role in the accuracy of binding affinity prediction. The Extended Connectivity Interaction Features (ECIF) is one such representation. We report that (i) including the discretized distances between protein-ligand atom pairs in the ECIF scheme improves predictive accuracy, and (ii) in an evaluation using gradient boosted trees, we found that the resampling method used in selecting the best hyperparameters has a strong effect on predictive performance, especially for benchmarking purposes.
用于构建机器学习模型的蛋白质-配体复合物表示形式在结合亲和力预测的准确性方面起着重要作用。扩展连接性相互作用特征(ECIF)就是这样一种表示形式。我们报告:(i)在ECIF方案中纳入蛋白质-配体原子对之间的离散距离可提高预测准确性;(ii)在使用梯度提升树的评估中,我们发现选择最佳超参数时所采用的重采样方法对预测性能有很大影响,尤其是出于基准测试目的时。