Liu Chengyou, Hogan Andrew M, Sturm Hunter, Khan Mohd Wasif, Islam Md Mohaiminul, Rahman A S M Zisanur, Davis Rebecca, Cardona Silvia T, Hu Pingzhao
Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada.
Department of Microbiology, University of Manitoba, Winnipeg, MB, Canada.
J Cheminform. 2022 Mar 12;14(1):12. doi: 10.1186/s13321-022-00596-6.
Chemical-genetic interaction profiling is a genetic approach that quantifies the susceptibility of a set of mutants depleted in specific gene product(s) to a set of chemical compounds. With the recent advances in artificial intelligence, chemical-genetic interaction profiles (CGIPs) can be leveraged to predict mechanism of action of compounds. This can be achieved by using machine learning, where the data from a CGIP is fed into the machine learning platform along with the chemical descriptors to develop a chemogenetically trained model. As small molecules can be considered non-structural data, graph convolutional neural networks, which can learn from the chemical structures directly, can be used to successfully predict molecular properties. Clustering analysis, on the other hand, is a critical approach to get insights into the underlying biological relationships between the gene products in the high-dimensional chemical-genetic data.
In this study, we proposed a comprehensive framework based on the large-scale chemical-genetics dataset built in Mycobacterium tuberculosis for predicting CGIPs using graph-based deep learning models. Our approach is structured into three parts. First, by matching M. tuberculosis genes with homologous genes in Escherichia coli (E. coli) according to their gene products, we grouped the genes into clusters with distinct biological functions. Second, we employed a directed message passing neural network to predict growth inhibition against M. tuberculosis gene clusters using a collection of 50,000 chemicals with the profile. We compared the performance of different baseline models and implemented multi-label tasks in binary classification frameworks. Lastly, we applied the trained model to an externally curated drug set that had experimental results against M. tuberculosis genes to examine the effectiveness of our method. Overall, we demonstrate that our approach effectively created M. tuberculosis gene clusters, and the trained classifier is able to predict activity against essential M. tuberculosis targets with high accuracy.
This work provides an analytical framework for modeling large-scale chemical-genetic datasets for predicting CGIPs and generating hypothesis about mechanism of action of novel drugs. In addition, this work highlights the importance of graph-based deep neural networks in drug discovery.
化学-遗传相互作用谱分析是一种遗传学方法,用于量化一组特定基因产物缺失的突变体对一组化合物的敏感性。随着人工智能的最新进展,化学-遗传相互作用谱(CGIP)可用于预测化合物的作用机制。这可以通过机器学习来实现,即将来自CGIP的数据与化学描述符一起输入到机器学习平台中,以开发经过化学遗传学训练的模型。由于小分子可被视为非结构化数据,能够直接从化学结构中学习的图卷积神经网络可用于成功预测分子特性。另一方面,聚类分析是深入了解高维化学-遗传数据中基因产物之间潜在生物学关系的关键方法。
在本研究中,我们基于在结核分枝杆菌中构建的大规模化学遗传学数据集,提出了一个综合框架,用于使用基于图的深度学习模型预测CGIP。我们的方法分为三个部分。首先,通过根据结核分枝杆菌基因的产物将其与大肠杆菌中的同源基因进行匹配,我们将这些基因分组为具有不同生物学功能的簇。其次,我们使用一个有50000种化学物质的集合及其谱,采用有向消息传递神经网络来预测针对结核分枝杆菌基因簇的生长抑制。我们比较了不同基线模型的性能,并在二元分类框架中实现了多标签任务。最后,我们将训练好的模型应用于一个经过外部策划的针对结核分枝杆菌基因有实验结果的药物集,以检验我们方法的有效性。总体而言,我们证明我们的方法有效地创建了结核分枝杆菌基因簇,并且训练好的分类器能够高精度地预测针对结核分枝杆菌必需靶点的活性。
这项工作为大规模化学遗传学数据集建模提供了一个分析框架,用于预测CGIP并生成关于新型药物作用机制的假设。此外,这项工作突出了基于图的深度神经网络在药物发现中的重要性。