Helmholtz-Zentrum München, German Research Center for Environmental Health (GmbH), Institute for Bioinformatics and Systems Biology, Ingolstädter Landstrasse 1, D-85764 Neuherberg.
Chem Biodivers. 2009 Nov;6(11):1837-44. doi: 10.1002/cbdv.200900075.
A large variety of log P calculation methods failed to produce sufficient accuracy in log P prediction for two in-house datasets of more than 96000 compounds contrary to their significantly better performances on public datasets. The minimum Root Mean Squared Error (RMSE) of 1.02 and 0.65 were calculated for the Pfizer and Nycomed datasets, respectively, in the 'out-of-box' implementation. Importantly, the use of local corrections (LC) implemented in the ALOGPS program based on experimental in-house log P data significantly reduced the RMSE to 0.59 and 0.48 for the Pfizer and Nycomed datasets, respectively, instantly without retraining the model. Moreover, more than 60% of molecules predicted with the highest confidence in each set had a mean absolute error (MAE) less than 0.33 log units that is only ca. 10% higher than the estimated variation in experimental log P measurements for the Pfizer dataset. Therefore, following this retrospective analysis, we suggest that the use of the predicted log P values with high confidence may eliminate the need of experimentally testing every other compound. This strategy could reduce the cost of measurements for pharmaceutical companies by a factor of 2, increase the confidence in prediction at the analog design stage of drug discovery programs, and could be extended to other ADMET properties.
大量的 log P 计算方法在预测超过 96000 种化合物的 log P 值时未能达到足够的准确性,而这些方法在公共数据集上的表现明显更好。在“开箱即用”的实现中,Pfizer 和 Nycomed 数据集的最小均方根误差(RMSE)分别计算为 1.02 和 0.65。重要的是,ALOGPS 程序中基于实验室内 log P 数据的本地校正(LC)的使用显著降低了 RMSE,分别为 Pfizer 和 Nycomed 数据集降低至 0.59 和 0.48,而无需重新训练模型。此外,在每个数据集的预测置信度最高的超过 60%的分子的平均绝对误差(MAE)小于 0.33 个 log 单位,仅比 Pfizer 数据集实验 log P 测量的估计变化高约 10%。因此,根据这项回顾性分析,我们建议使用高置信度的预测 log P 值可能可以消除对每个其他化合物进行实验测试的需要。这种策略可以将制药公司的测量成本降低 2 倍,提高药物发现项目中类似物设计阶段的预测置信度,并可扩展到其他 ADMET 性质。