Batool Muniba, Azam Naveed Ahmed, Zhu Jianshen, Haraguchi Kazuya, Zhao Liang, Akutsu Tatsuya
Discrete Mathematics and Computational Intelligence Laboratory, Department of Mathematics, Quaid-i-Azam University, Islamabad, Pakistan.
Discrete Mathematics Laboratory, Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, 606-8501, Kyoto, Japan.
J Cheminform. 2025 Mar 26;17(1):37. doi: 10.1186/s13321-025-00966-w.
Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR), and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1166] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol .
水溶性(AS)是一种关键的物理化学性质,在药物发现和材料设计中起着至关重要的作用。我们报告了一种新颖的统一方法,该方法基于简单的确定性图论描述符、多元线性回归(MLR)和混合整数线性规划(MILP)来预测和推断具有所需水溶性的化合物。基于逐步向前法选择的描述符使最简单的回归模型MLR与现有方法相比能够实现显著良好的预测准确性,对于29个不同的数据集,其准确率在[0.7191, 0.9377]范围内。通过将这些描述符和学习模型模拟为MILP,我们在合理的时间范围[6, 1166]秒内推断出具有所需水溶性、规定结构且最多50个非氢原子的数学精确且最优的化合物。这些发现表明简单的图论描述符与化合物的水溶性之间存在很强的相关性,这可能会在不依赖广泛使用的复杂化学描述符和计算成本高昂因而难以用于推断的复杂机器学习模型的情况下,更深入地理解它们的水溶性。所提出方法的实现可在https://github.com/ku-dml/mol-infer/tree/master/AqSol获取。