Ahmad Waqar, Tayara Hilal, Chong Kil To
Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea.
School of International Engineering and Science, Jeonbuk National University, Jeonju54896, South Korea.
ACS Omega. 2023 Jan 12;8(3):3236-3244. doi: 10.1021/acsomega.2c06702. eCollection 2023 Jan 24.
Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.
药物发现(DD)研究旨在发现新药物。溶解度是药物开发中的一项重要物理化学性质。活性药物成分(API)是实现高药物疗效的关键物质。在药物发现研究中,水溶解度(AS)是API表征所需的关键物理化学属性。高精度的计算机溶解度预测可降低药物开发的实验成本和时间。已经采用了几种人工智能工具,利用机器学习和深度学习技术进行溶解度预测。本研究旨在创建不同的深度学习模型,这些模型能够使用当前最大的可用溶解度数据集预测多种分子的溶解度。简化分子输入线性输入系统(SMILES)字符串用作分子表示,使用简单图卷积、图同构网络、图注意力网络和注意力FP网络开发模型。基于模型的性能,最终选择了基于注意力FP的网络模型。该模型在9943种化合物上进行了训练和测试。该模型在62种抗癌化合物上表现出色,皮尔逊相关系数和均方根误差值分别为0.52和0.61。可以通过改进图算法或添加更多分子性质来提高水溶解度。