College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, Shandong,China.
Department of Neurology Medicine, The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250033,China | College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, Shandong, China.
Comb Chem High Throughput Screen. 2022;25(4):642-650. doi: 10.2174/1386207324666210219102728.
Drug repositioning aims to screen drugs and therapeutic goals from approved drugs and abandoned compounds that have been identified as safe. This trend is changing the landscape of drug development and creating a model of drug repositioning for new drug development. In the recent decade, machine learning methods have been applied to predict the binding affinity of compound proteins, while deep learning is recently becoming prominent and achieving significant performances. Among the models, the way of representing the compounds is usually simple, which is the molecular fingerprints, i.e., a single SMILES string.
In this study, we improve previous work by proposing a novel representing manner, named SMILES#, to recode the SMILES string. This approach takes into account the properties of compounds and achieves superior performance. After that, we propose a deep learning model that combines recurrent neural networks with a convolutional neural network with an attention mechanism, using unlabeled data and labeled data to jointly encode molecules and predict binding affinity.
Experimental results show that SMILES# with compound properties can effectively improve the accuracy of the model and reduce the RMS error on most data sets.
We used the method to verify the related and unrelated compounds with the same target, and the experimental results show the effectiveness of the method.
药物重定位旨在从已确定安全的已批准药物和已废弃的化合物中筛选药物和治疗目标。这种趋势正在改变药物开发的格局,并为新药开发创造了药物重定位的模式。在最近十年中,机器学习方法已被应用于预测化合物蛋白质的结合亲和力,而深度学习最近变得突出并取得了显著的性能。在这些模型中,化合物的表示方式通常很简单,即分子指纹,即单个 SMILES 字符串。
在这项研究中,我们通过提出一种新颖的表示方式 SMILES# 来改进以前的工作,以重新编码 SMILES 字符串。这种方法考虑了化合物的性质,从而实现了卓越的性能。之后,我们提出了一种深度学习模型,该模型结合了具有注意力机制的递归神经网络和卷积神经网络,使用未标记数据和标记数据共同对分子进行编码并预测结合亲和力。
实验结果表明,具有化合物特性的 SMILES# 可以有效地提高模型的准确性并降低大多数数据集上的 RMS 误差。
我们使用该方法验证了具有相同靶标的相关和不相关化合物,实验结果表明了该方法的有效性。