CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica.
Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José, Costa Rica.
Phys Chem Chem Phys. 2023 Jul 12;25(27):17952-17965. doi: 10.1039/d3cp01428b.
In recent years the use of partition systems other than the widely used biphasic -octanol/water has received increased attention to gain insight into the molecular features that dictate the lipophilicity of compounds. Thus, the difference between -octanol/water and toluene/water partition coefficients has proven to be a valuable descriptor to study the propensity of molecules to form intramolecular hydrogen bonds and exhibit chameleon-like properties that modulate solubility and permeability. In this context, this study reports the experimental toluene/water partition coefficients (log ) for a series of 16 drugs that were selected as an external test set in the framework of the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) blind challenge. This external set has been used by the computational community to calibrate their methods in the current edition (SAMPL9) of this contest. Furthermore, the study also investigates the performance of two computational strategies for the prediction of log . The first relies on the development of two machine learning (ML) models, which are built up by combining the selection of 11 molecular descriptors in conjunction with either the multiple linear regression (MLR) or the random forest regression (RFR) model to target a dataset of 252 experimental log values. The second consists of the parametrization of the IEF-PCM/MST continuum solvation model from B3LYP/6-31G(d) calculations to predict the solvation free energies of 163 compounds in toluene and benzene. The performance of the ML and IEF-PCM/MST models has been calibrated against external test sets, including the compounds that define the SAMPL9 log challenge. The results are used to discuss the merits and weaknesses of the two computational approaches.
近年来,人们越来越关注使用除广泛使用的两相 - 辛醇/水之外的分配系统,以深入了解决定化合物亲脂性的分子特征。因此,- 辛醇/水和甲苯/水分配系数之间的差异已被证明是一个有价值的描述符,可用于研究分子形成分子内氢键的倾向以及表现出变色龙样性质以调节溶解度和渗透性。在这种情况下,本研究报告了一系列 16 种药物的实验甲苯/水分配系数(logP),这些药物被选为在蛋白质和配体建模的统计评估(SAMPL)盲挑战框架内的外部测试集。该外部集已被计算社区用于在当前版本(SAMPL9)的竞赛中校准其方法。此外,该研究还研究了两种用于预测 logP 的计算策略的性能。第一种方法依赖于开发两个机器学习(ML)模型,这些模型通过结合选择 11 个分子描述符与多元线性回归(MLR)或随机森林回归(RFR)模型相结合来构建,以针对 252 个实验 logP 值数据集进行目标。第二种方法包括从 B3LYP/6-31G(d)计算中参数化 IEF-PCM/MST 连续溶剂化模型,以预测 163 种化合物在甲苯和苯中的溶剂化自由能。ML 和 IEF-PCM/MST 模型的性能已通过外部测试集进行校准,包括定义 SAMPL9 logP 挑战的化合物。结果用于讨论两种计算方法的优缺点。