Harmonic Discovery Inc., New York City, NY, USA.
Nat Commun. 2024 Aug 31;15(1):7596. doi: 10.1038/s41467-024-52055-5.
Machine learning provides efficient ways to map compound-kinase interactions. However, diverse bioactivity data types, including single-dose and multi-dose-response assay results, present challenges. Traditional models utilize only multi-dose data, overlooking information contained in single-dose measurements. Here, we propose a machine learning methodology for compound-kinase activity prediction that leverages both single-dose and dose-response data. We demonstrate that our two-stage approach yields accurate activity predictions and significantly improves model performance compared to training solely on dose-response labels. This superior performance is consistent across five diverse machine learning methods. Using the best performing model, we carried out extensive experimental profiling on a total of 347 selected compound-kinase pairs, achieving a high hit rate of 40% and a negative predictive value of 78%. We show that these rates can be improved further by incorporating model uncertainty estimates into the compound selection process. By integrating multiple activity data types, we demonstrate that our approach holds promise for facilitating the development of training activity datasets in a more efficient and cost-effective way.
机器学习为化合物-激酶相互作用的映射提供了高效的方法。然而,多样化的生物活性数据类型,包括单剂量和多剂量反应测定结果,带来了挑战。传统模型仅利用多剂量数据,忽略了单剂量测量中包含的信息。在这里,我们提出了一种利用单剂量和剂量反应数据的化合物-激酶活性预测的机器学习方法。我们证明,与仅在剂量反应标签上进行训练相比,我们的两阶段方法可以产生准确的活性预测,并显著提高模型性能。这种优越的性能在五种不同的机器学习方法中是一致的。使用表现最好的模型,我们对总共 347 对选定的化合物-激酶对进行了广泛的实验分析,获得了 40%的高命中率和 78%的负预测值。我们表明,通过将模型不确定性估计纳入化合物选择过程,可以进一步提高这些比率。通过整合多种活性数据类型,我们证明我们的方法有希望以更有效和更具成本效益的方式促进训练活性数据集的开发。