Rodríguez-Pérez Raquel, Miljković Filip, Bajorath Jürgen
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany.
J Cheminform. 2020 May 24;12(1):36. doi: 10.1186/s13321-020-00434-7.
For kinase inhibitors, X-ray crystallography has revealed different types of binding modes. Currently, more than 2000 kinase inhibitors with known binding modes are available, which makes it possible to derive and test machine learning models for the prediction of inhibitors with different binding modes. We have addressed this prediction task to evaluate and compare the information content of distinct molecular representations including protein-ligand interaction fingerprints (IFPs) and compound structure-based structural fingerprints (i.e., atom environment/fragment fingerprints). IFPs were designed to capture binding mode-specific interaction patterns at different resolution levels. Accurate predictions of kinase inhibitor binding modes were achieved with random forests using both representations. The performance of IFPs was consistently superior to atom environment fingerprints, albeit only by less than 10%. An active learning strategy applying information entropy-based selection of training instances was applied as a diagnostic approach to assess the relative information content of distinct representations. IFPs were found to capture more binding mode-relevant information than atom environment fingerprints, leading to highly predictive models even when training instances were randomly selected. By contrast, for atom environment fingerprints, the derivation of accurate models via active learning depended on entropy-based selection of informative training compounds. Notably, higher information content of IFPs confirmed by active learning only resulted in small improvements in global prediction accuracy compared to models derived using atom environment fingerprints. For practical applications, prediction of binding modes of new kinase inhibitors on the basis of chemical structure is highly attractive.
对于激酶抑制剂而言,X射线晶体学已揭示出不同类型的结合模式。目前,有2000多种具有已知结合模式的激酶抑制剂,这使得推导和测试用于预测不同结合模式抑制剂的机器学习模型成为可能。我们已着手处理这一预测任务,以评估和比较不同分子表示形式的信息含量,包括蛋白质-配体相互作用指纹(IFP)和基于化合物结构的结构指纹(即原子环境/片段指纹)。IFP旨在在不同分辨率水平上捕捉结合模式特异性的相互作用模式。使用这两种表示形式,通过随机森林实现了对激酶抑制剂结合模式的准确预测。IFP的性能始终优于原子环境指纹,尽管仅高出不到10%。一种应用基于信息熵的训练实例选择的主动学习策略被用作一种诊断方法,以评估不同表示形式的相对信息含量。结果发现,IFP比原子环境指纹捕捉到更多与结合模式相关的信息,即使在随机选择训练实例时也能产生高度预测性的模型。相比之下,对于原子环境指纹,通过主动学习推导准确模型依赖于基于熵的信息丰富的训练化合物选择。值得注意的是,与使用原子环境指纹推导的模型相比,主动学习证实的IFP的更高信息含量仅在全局预测准确性上带来了小幅提升。对于实际应用而言,基于化学结构预测新激酶抑制剂的结合模式极具吸引力。