Chen Jiarui, Si Yain-Whar, Un Chon-Wai, Siu Shirley W I
Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, 999078, Macau, China.
Institute of Science and Environment, University of Saint Joseph, Rua de Londres 106, 999078, Macau, China.
J Cheminform. 2021 Nov 27;13(1):93. doi: 10.1186/s13321-021-00570-8.
As safety is one of the most important properties of drugs, chemical toxicology prediction has received increasing attentions in the drug discovery research. Traditionally, researchers rely on in vitro and in vivo experiments to test the toxicity of chemical compounds. However, not only are these experiments time consuming and costly, but experiments that involve animal testing are increasingly subject to ethical concerns. While traditional machine learning (ML) methods have been used in the field with some success, the limited availability of annotated toxicity data is the major hurdle for further improving model performance. Inspired by the success of semi-supervised learning (SSL) algorithms, we propose a Graph Convolution Neural Network (GCN) to predict chemical toxicity and trained the network by the Mean Teacher (MT) SSL algorithm. Using the Tox21 data, our optimal SSL-GCN models for predicting the twelve toxicological endpoints achieve an average ROC-AUC score of 0.757 in the test set, which is a 6% improvement over GCN models trained by supervised learning and conventional ML methods. Our SSL-GCN models also exhibit superior performance when compared to models constructed using the built-in DeepChem ML methods. This study demonstrates that SSL can increase the prediction power of models by learning from unannotated data. The optimal unannotated to annotated data ratio ranges between 1:1 and 4:1. This study demonstrates the success of SSL in chemical toxicity prediction; the same technique is expected to be beneficial to other chemical property prediction tasks by utilizing existing large chemical databases. Our optimal model SSL-GCN is hosted on an online server accessible through: https://app.cbbio.online/ssl-gcn/home .
由于安全性是药物最重要的特性之一,化学毒理学预测在药物发现研究中受到越来越多的关注。传统上,研究人员依靠体外和体内实验来测试化合物的毒性。然而,这些实验不仅耗时且成本高昂,而且涉及动物试验的实验越来越受到伦理问题的困扰。虽然传统的机器学习(ML)方法已在该领域取得了一些成功,但带注释的毒性数据有限是进一步提高模型性能的主要障碍。受半监督学习(SSL)算法成功的启发,我们提出了一种图卷积神经网络(GCN)来预测化学毒性,并通过平均教师(MT)SSL算法对网络进行训练。使用Tox21数据,我们用于预测十二个毒理学终点的最优SSL-GCN模型在测试集中的平均ROC-AUC分数达到0.757,比通过监督学习和传统ML方法训练的GCN模型提高了6%。与使用内置的DeepChem ML方法构建的模型相比,我们的SSL-GCN模型也表现出卓越的性能。这项研究表明,SSL可以通过从未注释的数据中学习来提高模型的预测能力。最优的未注释数据与注释数据的比例在1:1到4:1之间。这项研究证明了SSL在化学毒性预测方面的成功;预计同样的技术通过利用现有的大型化学数据库,将有利于其他化学性质预测任务。我们的最优模型SSL-GCN托管在一个在线服务器上,可通过以下链接访问:https://app.cbbio.online/ssl-gcn/home 。