Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia.
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad465.
Increasing efforts are being made in the field of machine learning to advance the learning of robust and accurate models from experimentally measured data and enable more efficient drug discovery processes. The prediction of binding affinity is one of the most frequent tasks of compound bioactivity modelling. Learned models for binding affinity prediction are assessed by their average performance on unseen samples, but point predictions are typically not provided with a rigorous confidence assessment. Approaches, such as the conformal predictor framework equip conventional models with a more rigorous assessment of confidence for individual point predictions. In this article, we extend the inductive conformal prediction framework for interaction data, in particular the compound-target binding affinity prediction task. The new framework is based on dynamically defined calibration sets that are specific for each testing pair and provides prediction assessment in the context of calibration pairs from its compound-target neighbourhood, enabling improved estimates based on the local properties of the prediction model.
The effectiveness of the approach is benchmarked on several publicly available datasets and tested in realistic use-case scenarios with increasing levels of difficulty on a complex compound-target binding affinity space. We demonstrate that in such scenarios, novel approach combining applicability domain paradigm with conformal prediction framework, produces superior confidence assessment with valid and more informative prediction regions compared to other 'state-of-the-art' conformal prediction approaches.
Dataset and the code are available on GitHub (https://github.com/mlkr-rbi/dAD).
在机器学习领域,人们正在加大力度,从实验测量数据中推进稳健且准确模型的学习,并使药物发现过程更加高效。结合亲和力预测是化合物生物活性建模中最常见的任务之一。用于结合亲和力预测的学习模型通过其在未见样本上的平均性能进行评估,但通常不会对点预测提供严格的置信度评估。诸如一致性预测器框架等方法为常规模型提供了更严格的置信度评估,以进行单个点预测。在本文中,我们扩展了交互数据的归纳一致性预测框架,特别是化合物-靶标结合亲和力预测任务。新框架基于为每个测试对动态定义的校准集,并且在其化合物-靶标邻域的校准对上下文中提供预测评估,从而能够基于预测模型的局部特性进行改进的估计。
该方法在几个公开可用的数据集上进行了基准测试,并在具有复杂化合物-靶标结合亲和力空间的现实用例场景中进行了测试,随着难度的增加,对其进行了测试。我们证明,在这种情况下,将适用性域范例与一致性预测框架相结合的新方法与其他“最先进”的一致性预测方法相比,在有效性和更具信息量的预测区域方面产生了更好的置信度评估。
数据集和代码可在 GitHub 上获得(https://github.com/mlkr-rbi/dAD)。