College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Health Data Science, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
Bioinformatics. 2021 Sep 29;37(18):2988-2995. doi: 10.1093/bioinformatics/btab207.
Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less beneficial to directly integrate KGs with other smaller but higher quality data (e.g. experimental data). Most of existing approaches ignore KGs altogether. Some tries to directly integrate KGs with other data via graph neural networks with limited success. Furthermore most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is more meaningful but harder task.
To fill the gaps, we propose a new method SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module that can efficiently anchor on relevant subgraphs from a KG, a self-attention based subgraph summarization scheme to generate reasoning path within the subgraph, and a multi-channel knowledge and data integration module that utilizes massive external biomedical knowledge for significantly improved multi-typed DDI predictions. SumGNN outperforms the best baseline by up to 5.54%, and performance gain is particularly significant in low data relation types. In addition, SumGNN provides interpretable prediction via the generated reasoning paths for each prediction.
The code is available in Supplementary Material.
Supplementary data are available at Bioinformatics online.
由于药物相互作用(DDI)数据集和大型生物医学知识图谱(KG)的可用性不断增加,使用机器学习模型准确检测不良 DDI 成为可能。然而,如何有效地利用大型嘈杂的生物医学 KG 进行 DDI 检测仍然是一个悬而未决的问题。由于 KG 的规模庞大且存在大量噪声,直接将 KG 与其他较小但质量更高的数据(例如实验数据)集成通常效果不佳。大多数现有方法完全忽略了 KGs。有些方法试图通过图神经网络直接将 KGs 与其他数据集成,但效果有限。此外,大多数以前的工作都专注于二元 DDI 预测,而多类型 DDI 药理效应预测则是更有意义但更具挑战性的任务。
为了填补空白,我们提出了一种新的方法 SumGNN:知识总结图神经网络,它由一个子图提取模块支持,该模块可以从 KG 中有效地锚定相关子图,基于自注意力的子图总结方案在子图内生成推理路径,以及一个多通道知识和数据集成模块,利用大量外部生物医学知识,显著提高多类型 DDI 预测的性能。SumGNN 的性能比最佳基线提高了 5.54%,在数据关系类型较少的情况下,性能提升尤为显著。此外,SumGNN 通过为每个预测生成推理路径,提供可解释的预测。
代码可在补充材料中获得。
补充数据可在生物信息学在线获得。