Laboratoire de Synthèse et Biocatalyse Organique, Département de Chimie, Faculté des Sciences, Université Badji Mokhtar Annaba, Annaba, Algeria.
Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), Constantine, Algeria.
Chem Biol Drug Des. 2020 Sep;96(3):961-972. doi: 10.1111/cbdd.13742.
Over the past decade, rapid development in biological and chemical technologies such as high-throughput screening, parallel synthesis, has been significantly increased the amount of data, which requires the creation and the integration of new analytical methods, especially deep learning models. Recently, there is an increasing interest in deep learning utilization in computer-aided drug discovery due to its exceptional successful application in many fields. The present work proposed a natural language processing approach, based on embedding deep neural networks. Our method aims to transform the Simplified Molecular Input Line Entry System format into word embedding vectors to represent the semantics of compounds. These vectors are fed into supervised machine learning algorithms such as convolutional long short-term memory neural network, support vector machine, and random forest to build up quantitative structure-activity relationship models on toxicity data sets. The obtained results on toxicity data to the ciliate Tetrahymena pyriformis (IGC ), and acute toxicity rat data expressed as median lethal dose of treated rats (LD ) show that our approach can eventually be used to predict the activities of chemical compounds efficiently. All material used in this study is available online through the GitHub portal (https://github.com/BoukeliaAbdelbasset/NLPDeepQSAR.git).
在过去的十年中,高通量筛选、平行合成等生物技术和化学技术的快速发展,大大增加了数据量,这需要创建和整合新的分析方法,特别是深度学习模型。最近,由于深度学习在许多领域的成功应用,人们对其在计算机辅助药物发现中的应用越来越感兴趣。本工作提出了一种基于嵌入深度神经网络的自然语言处理方法。我们的方法旨在将简化分子输入行进入系统格式转换为单词嵌入向量,以表示化合物的语义。然后将这些向量输入到监督机器学习算法中,如卷积长短期记忆神经网络、支持向量机和随机森林,以建立毒性数据集上的定量构效关系模型。在对纤毛虫四膜虫(IGC)的毒性数据和急性毒性大鼠数据(以处理大鼠的半数致死剂量(LD)表示)的获得结果表明,我们的方法最终可以有效地预测化合物的活性。本研究中使用的所有材料都可通过 GitHub 门户(https://github.com/BoukeliaAbdelbasset/NLPDeepQSAR.git)在线获得。