Walter Moritz, Allen Luke N, de la Vega de León Antonio, Webb Samuel J, Gillet Valerie J
Information School, University of Sheffield, 211 Portobello, Sheffield, S1 4DP, UK.
Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK.
J Cheminform. 2022 Jun 7;14(1):32. doi: 10.1186/s13321-022-00611-w.
Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale with 417 assays. Moreover, we analyzed in detail the improvements shown by the imputation models. We found that test compounds that were dissimilar to training compounds, as well as test compounds with a large number of experimental values for other assays, showed the largest improvements. We also investigated the impact of sparsity on the improvements seen as well as the relatedness of the assays being considered. Our results show that even a small amount of additional information can provide imputation methods with a strong boost in predictive performance over traditional single task and multi-task predictive models.
最近,插补技术已被用于预测稀疏生物活性矩阵中的活性值,与传统的定量构效关系(QSAR)模型相比,预测性能有所提高。这些模型在预测测试化合物在特定测定中的活性时,能够使用辅助测定的实验活性值。在本研究中,我们在三个基于分类的毒性数据集上测试了三种不同的多任务插补技术:两个小规模数据集(每个有12种测定)和一个大规模数据集,有417种测定。此外,我们详细分析了插补模型所显示的改进。我们发现,与训练化合物不同的测试化合物,以及在其他测定中有大量实验值的测试化合物,显示出最大的改进。我们还研究了稀疏性对所观察到的改进的影响以及所考虑测定之间的相关性。我们的结果表明,即使是少量的额外信息,也能使插补方法在预测性能上比传统的单任务和多任务预测模型有显著提升。