Mousa Nesma, Varbanov Hristo P, Kaipanchery Vidya, Gabano Elisabetta, Ravera Mauro, Toropov Andrey A, Charochkina Larisa, Menezes Filipe, Godin Guillaume, Tetko Igor V
Freie Universität Berlin, Fachbereich Biologie, Chemie, Pharmazie, Takustr. 3, 14195 Berlin, Germany.
Institute of Pharmacy/Pharmaceutical Chemistry, University of Innsbruck, Center for Chemistry and Biomedicine, Innrain 80 - 82/IV, 6020 Innsbruck, Austria.
J Inorg Biochem. 2025 Aug;269:112890. doi: 10.1016/j.jinorgbio.2025.112890. Epub 2025 Mar 10.
Predicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at https://ochem.eu/article/31.
预测铂(II,IV)配合物的溶解度和亲脂性对于在药物发现中筛选潜在的抗癌候选物至关重要。本研究引入了首个公开可用的在线模型来预测铂配合物的溶解度,以解决这方面文献和模型的匮乏问题。使用时间分割数据集,我们在284种历史化合物(2017年之前报告的溶解度数据)的训练集上通过5折交叉验证开发了一个均方根误差(RMSE)为0.62的共识模型。然而,当应用于2017年后报告的108种化合物的前瞻性测试集时,RMSE增加到了0.86。对高预测误差的进一步分析表明,这些不准确主要归因于训练集中新型化学支架的代表性不足,特别是Pt(IV)衍生物。例如,一系列八种含菲咯啉的化合物不在训练集的化学空间范围内,其RMSE为1.3。当使用组合数据集重新开发模型时,在相同的验证协议下,该系列的RMSE显著降至0.34。此外,我们开发了一个可解释的线性模型来识别影响铂配合物溶解度的结构特征和官能团。我们进一步验证了溶解度和亲脂性之间的相关性,这与亚尔科夫斯基通用溶解度方程一致。基于这些见解,我们开发了一个最终的多任务模型,该模型同时将溶解度和亲脂性作为两个端点进行预测,RMSE分别为0.62和0.44。数据和最终开发的模型可在https://ochem.eu/article/31获取。