Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomedica (PRBB), 08003 Barcelona, Catalonia, Spain.
Bioinformatics. 2021 Jun 16;37(10):1376-1382. doi: 10.1093/bioinformatics/btaa982.
Machine-learning scoring functions (SFs) have been found to outperform standard SFs for binding affinity prediction of protein-ligand complexes. A plethora of reports focus on the implementation of increasingly complex algorithms, while the chemical description of the system has not been fully exploited.
Herein, we introduce Extended Connectivity Interaction Features (ECIF) to describe protein-ligand complexes and build machine-learning SFs with improved predictions of binding affinity. ECIF are a set of protein-ligand atom-type pair counts that take into account each atom's connectivity to describe it and thus define the pair types. ECIF were used to build different machine-learning models to predict protein-ligand affinities (pKd/pKi). The models were evaluated in terms of 'scoring power' on the Comparative Assessment of Scoring Functions 2016. The best models built on ECIF achieved Pearson correlation coefficients of 0.857 when used on its own, and 0.866 when used in combination with ligand descriptors, demonstrating ECIF descriptive power.
Data and code to reproduce all the results are freely available at https://github.com/DIFACQUIM/ECIF.
Supplementary data are available at Bioinformatics online.
机器学习评分函数 (SF) 已被发现优于标准 SF,可用于预测蛋白质-配体复合物的结合亲和力。大量报告专注于越来越复杂算法的实现,而系统的化学描述并未得到充分利用。
在此,我们引入了扩展连接性相互作用特征 (ECIF) 来描述蛋白质-配体复合物,并构建了具有改进结合亲和力预测能力的机器学习 SF。ECIF 是一组蛋白质-配体原子类型对计数,它考虑了每个原子的连接性来描述它,从而定义了对类型。ECIF 用于构建不同的机器学习模型来预测蛋白质-配体亲和力 (pKd/pKi)。这些模型根据 2016 年比较评分函数评估中的“评分能力”进行了评估。在单独使用时,基于 ECIF 构建的最佳模型达到了 0.857 的 Pearson 相关系数,当与配体描述符结合使用时,达到了 0.866,这证明了 ECIF 的描述能力。
可在 https://github.com/DIFACQUIM/ECIF 上免费获取重现所有结果的数据和代码。
补充数据可在 Bioinformatics 在线获得。