Kojima Ryosuke, Ishida Shoichi, Ohta Masateru, Iwata Hiroaki, Honma Teruki, Okuno Yasushi
Graduate School of Medicine, Kyoto University, Shogoin-kawaharacho, Sakyo-ku, Kyoto, 606-8507, Japan.
Graduate School of Pharmaceutical Sciences, Kyoto University, Yoshida, Sakyo-ku, Kyoto, 606-8501, Japan.
J Cheminform. 2020 May 12;12(1):32. doi: 10.1186/s13321-020-00435-6.
Deep learning is developing as an important technology to perform various tasks in cheminformatics. In particular, graph convolutional neural networks (GCNs) have been reported to perform well in many types of prediction tasks related to molecules. Although GCN exhibits considerable potential in various applications, appropriate utilization of this resource for obtaining reasonable and reliable prediction results requires thorough understanding of GCN and programming. To leverage the power of GCN to benefit various users from chemists to cheminformaticians, an open-source GCN tool, kGCN, is introduced. To support the users with various levels of programming skills, kGCN includes three interfaces: a graphical user interface (GUI) employing KNIME for users with limited programming skills such as chemists, as well as command-line and Python library interfaces for users with advanced programming skills such as cheminformaticians. To support the three steps required for building a prediction model, i.e., pre-processing, model tuning, and interpretation of results, kGCN includes functions of typical pre-processing, Bayesian optimization for automatic model tuning, and visualization of the atomic contribution to prediction for interpretation of results. kGCN supports three types of approaches, single-task, multi-task, and multi-modal predictions. The prediction of compound-protein interaction for four matrixmetalloproteases, MMP-3, -9, -12 and -13, in the inhibition assays is performed as a representative case study using kGCN. Additionally, kGCN provides the visualization of atomic contributions to the prediction. Such visualization is useful for the validation of the prediction models and the design of molecules based on the prediction model, realizing "explainable AI" for understanding the factors affecting AI prediction. kGCN is available at https://github.com/clinfo.
深度学习正发展成为化学信息学中执行各种任务的一项重要技术。特别是,据报道图卷积神经网络(GCN)在与分子相关的多种预测任务中表现良好。尽管GCN在各种应用中展现出巨大潜力,但要合理利用这一资源以获得合理可靠的预测结果,需要对GCN有透彻的理解和编程能力。为了利用GCN的强大功能使从化学家到化学信息学家的各类用户受益,我们引入了一个开源GCN工具kGCN。为了支持具有不同编程技能水平的用户,kGCN包括三个接口:一个使用KNIME的图形用户界面(GUI),供编程技能有限的用户(如化学家)使用,以及供编程技能较高的用户(如化学信息学家)使用的命令行和Python库接口。为了支持构建预测模型所需的三个步骤,即预处理、模型调优和结果解释,kGCN包括典型预处理功能、用于自动模型调优的贝叶斯优化以及用于结果解释的预测原子贡献可视化。kGCN支持三种类型的方法,即单任务、多任务和多模态预测。使用kGCN对四种基质金属蛋白酶MMP - 3、- 9、- 12和- 13在抑制试验中的化合物 - 蛋白质相互作用进行预测作为一个代表性案例研究。此外,kGCN还提供预测的原子贡献可视化。这种可视化对于预测模型的验证以及基于预测模型的分子设计很有用,实现了“可解释的人工智能”以理解影响人工智能预测的因素。kGCN可在https://github.com/clinfo获取。