Czub Natalia, Pacławski Adam, Szlęk Jakub, Mendyk Aleksander
Department of Pharmaceutical Technology and Biopharmaceutics, Jagiellonian University Medical College, 30-688 Kraków, Poland.
Pharmaceutics. 2021 Oct 16;13(10):1711. doi: 10.3390/pharmaceutics13101711.
Introduction of a new drug to the market is a challenging and resource-consuming process. Predictive models developed with the use of artificial intelligence could be the solution to the growing need for an efficient tool which brings practical and knowledge benefits, but requires a large amount of high-quality data. The aim of our project was to develop quantitative structure-activity relationship (QSAR) model predicting serotonergic activity toward the 5-HT1A receptor on the basis of a created database. The dataset was obtained using ZINC and ChEMBL databases. It contained 9440 unique compounds, yielding the largest available database of 5-HT1A ligands with specified pKi value to date. Furthermore, the predictive model was developed using automated machine learning (AutoML) methods. According to the 10-fold cross-validation (10-CV) testing procedure, the root-mean-squared error (RMSE) was 0.5437, and the coefficient of determination () was 0.74. Moreover, the Shapley Additive Explanations method (SHAP) was applied to assess a more in-depth understanding of the influence of variables on the model's predictions. According to to the problem definition, the developed model can efficiently predict the affinity value for new molecules toward the 5-HT1A receptor on the basis of their structure encoded in the form of molecular descriptors. Usage of this model in screening processes can significantly improve the process of discovery of new drugs in the field of mental diseases and anticancer therapy.
将一种新药推向市场是一个具有挑战性且耗费资源的过程。利用人工智能开发的预测模型可能是解决对高效工具日益增长的需求的办法,这种工具能带来实际和知识效益,但需要大量高质量数据。我们项目的目标是基于创建的数据库开发预测对5-HT1A受体的血清素能活性的定量构效关系(QSAR)模型。数据集是使用ZINC和ChEMBL数据库获得的。它包含9440种独特化合物,产生了迄今为止最大的具有指定pKi值的5-HT1A配体可用数据库。此外,预测模型是使用自动化机器学习(AutoML)方法开发的。根据10倍交叉验证(10-CV)测试程序,均方根误差(RMSE)为0.5437,决定系数()为0.74。此外,应用了夏普利值附加解释方法(SHAP)来更深入地理解变量对模型预测的影响。根据问题定义,所开发的模型可以根据以分子描述符形式编码的新分子结构有效地预测其对5-HT1A受体的亲和力值。在筛选过程中使用该模型可以显著改善精神疾病和抗癌治疗领域新药的发现过程。