College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
College of Software, Dalian JiaoTong University, Dalian, China.
BMC Bioinformatics. 2018 Sep 17;19(1):328. doi: 10.1186/s12859-018-2316-x.
The effective combination of texts and knowledge may improve performances of natural language processing tasks. For the recognition of chemical-induced disease (CID) relations which may span sentence boundaries in an article, although existing CID systems explored the utilization for knowledge bases, the effects of different knowledge on the identification of a special CID haven't been distinguished by these systems. Moreover, systems based on neural network only constructed sentence or mention level models.
In this work, we proposed an effective document level neural model integrated domain knowledge to extract CID relations from biomedical articles. Basic semantic information of an article with respect to a special CID candidate pair was learned from the document level sub-network module. Furthermore, knowledge attention depending on the representation of the article was proposed to distinguish the influences of different knowledge on the special CID pair and then the final representation of knowledge was formed by aggregating weighed knowledge. Finally, the integrated representations of texts and knowledge were passed to a softmax classifier to perform the CID recognition. Experimental results on the chemical-disease relation corpus proposed by BioCreative V show that our proposed system integrated knowledge achieves a good overall performance compared with other state-of-the-art systems.
Experimental analyses demonstrate that the introduced attention mechanism on domain knowledge plays a significant role in distinguishing influences of different knowledge on the judgment for a special CID relation.
文本与知识的有效结合可能会提高自然语言处理任务的性能。对于文章中可能跨越句子边界的化学诱导疾病 (CID) 关系的识别,尽管现有的 CID 系统已经探索了知识库的利用,但这些系统并没有区分不同知识对特殊 CID 识别的影响。此外,基于神经网络的系统仅构建了句子或提及级别的模型。
在这项工作中,我们提出了一种有效的文档级神经模型,该模型集成了领域知识,从生物医学文章中提取 CID 关系。从文档级子网模块中学习与特殊 CID 候选对相关的文章的基本语义信息。此外,提出了基于表示的知识注意力,以区分不同知识对特殊 CID 对的影响,然后通过聚合加权知识形成最终的知识表示。最后,将文本和知识的综合表示传递给 softmax 分类器来执行 CID 识别。在 BioCreative V 提出的化学-疾病关系语料库上的实验结果表明,与其他最先进的系统相比,我们提出的集成知识的系统具有良好的整体性能。
实验分析表明,引入的领域知识注意力机制在区分不同知识对特殊 CID 关系判断的影响方面发挥了重要作用。