Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics , University of Tehran , Tehran 1417614411 , Iran.
School of Pharmacy, Faculty of Health Sciences , University of Eastern Finland , Kuopio 80100 , Finland.
J Chem Inf Model. 2019 Nov 25;59(11):4528-4539. doi: 10.1021/acs.jcim.9b00626. Epub 2019 Nov 8.
The main problem of small molecule-based drug discovery is to find a candidate molecule with increased pharmacological activity, proper ADME, and low toxicity. Recently, machine learning has driven a significant contribution to drug discovery. However, many machine learning methods, such as deep learning-based approaches, require a large amount of training data to form accurate predictions for unseen data. In lead optimization step, the amount of available biological data on small molecule compounds is low, which makes it a challenging problem to apply machine learning methods. The main goal of this study is to design a new approach to handle these situations. To this end, source assay (auxiliary assay) knowledge is utilized to learn a better model to predict the property of new compounds in the target assay. Up to now, the current approaches did not consider that source and target assays are adapted to different target groups with different compounds distribution. In this paper, we propose a new architecture by utilizing graph convolutional network and adversarial domain adaptation network to tackle this issue. To evaluate the proposed approach, we applied it to Tox21, ToxCast, SIDER, HIV, and BACE collections. The results showed the effectiveness of the proposed approach in transferring the related knowledge from source to target data set.
小分子药物发现的主要问题是找到具有增强的药理学活性、适当的 ADME 和低毒性的候选分子。最近,机器学习为药物发现做出了重大贡献。然而,许多机器学习方法,如基于深度学习的方法,需要大量的训练数据才能对未见数据形成准确的预测。在先导优化步骤中,小分子化合物的可用生物数据量较低,这使得应用机器学习方法成为一个具有挑战性的问题。本研究的主要目标是设计一种新方法来处理这些情况。为此,利用源测定(辅助测定)知识来学习更好的模型,以预测目标测定中新化合物的性质。到目前为止,当前的方法没有考虑到源和目标测定适用于具有不同化合物分布的不同目标群体。在本文中,我们提出了一种新的架构,利用图卷积网络和对抗性领域自适应网络来解决这个问题。为了评估所提出的方法,我们将其应用于 Tox21、ToxCast、SIDER、HIV 和 BACE 数据集。结果表明,该方法在将相关知识从源数据集转移到目标数据集方面是有效的。