IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):489-499. doi: 10.1109/TCBB.2021.3135844. Epub 2023 Feb 3.
The automatic extraction of the chemical-disease relation (CDR) from the text becomes critical because it takes a lot of time and effort to extract valuable CDR manually. Studies have shown that prior knowledge from the biomedical knowledge base is important for relation extraction. The method of combining deep learning models with prior knowledge is worthy of our study. In this paper, we propose a new model called Knowledge Guided Attention and Graph Convolutional Networks (KGAGN) for CDR extraction. First, to make full advantage of domain knowledge, we train entity embedding as a feature representation of input sequence, and relation embedding to capture weighted contextual information further through the attention mechanism. Then, to make full advantage of syntactic dependency information in cross-sentence CDR extraction, we construct document-level syntactic dependency graphs and encode them using a graph convolution network (GCN). Finally, the chemical-induced disease (CID) relation is extracted by using weighted context features and long-range dependency features both of which contain additional knowledge information We evaluated our model on the CDR dataset published by the BioCreative-V community and achieves an F1-score of 73.3%, surpassing other state-of-the-art methods. the code implemented by PyTorch 1.7.0 deep learning library can be downloaded from Github: https://github.com/sunyi123/cdr.
从文本中自动提取化学-疾病关系(CDR)变得至关重要,因为手动提取有价值的 CDR 需要大量的时间和精力。研究表明,来自生物医学知识库的先验知识对于关系提取很重要。将深度学习模型与先验知识相结合的方法值得我们研究。在本文中,我们提出了一种名为知识引导注意和图卷积网络(KGAGN)的新模型,用于 CDR 提取。首先,为了充分利用领域知识,我们将实体嵌入作为输入序列的特征表示进行训练,并通过注意力机制进一步捕捉关系嵌入加权上下文信息。然后,为了充分利用跨句子 CDR 提取中的句法依存信息,我们构建了文档级句法依存图,并使用图卷积网络(GCN)对其进行编码。最后,通过使用包含附加知识信息的加权上下文特征和长程依赖特征来提取化学诱导疾病(CID)关系。我们在 BioCreative-V 社区发布的 CDR 数据集上评估了我们的模型,F1 得分为 73.3%,超过了其他最先进的方法。由 PyTorch 1.7.0 深度学习库实现的代码可以从 Github 下载:https://github.com/sunyi123/cdr。