College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
Interdiscip Sci. 2021 Jun;13(2):312-320. doi: 10.1007/s12539-021-00425-8. Epub 2021 Mar 17.
Discovering relations of cross-type biomedical entities is crucial for biology research. A large amount of potential or indirect connected biological relations is hidden in millions of biomedical literatures and biological databases. The previous rules-based and deep learning approaches rely on plenty of manual annotations, which is laborious, time-consuming and unsatisfactory. It is necessary to be able to combine available annotated gene databases, chemical, genomic, clinical and other types of data repositories as domain knowledge to assist the extraction of biological entity relations from numerous literatures. Under this scenario, this paper proposes BioGraphSAGE model, a Siamese graph neural network with structured databases as domain knowledge to extract biological entity relations from literatures. Our model combines both biological semantic features and positional features to improve the recognition of relations between distant entities in the same literature. The experiment results show that BioGraphSAGE achieves the best F1 score among other relation extraction models on smaller annotated samples. Moreover, the proposed model can still maintain a F1 score of 0.526 without using annotated training samples.
发现跨类型生物医学实体之间的关系对于生物学研究至关重要。大量潜在的或间接相关的生物关系隐藏在数百万篇生物医学文献和生物数据库中。以前基于规则和深度学习的方法依赖于大量的手动标注,这既费力、耗时又不尽如人意。有必要能够结合可用的已标注基因数据库、化学、基因组、临床和其他类型的数据库作为领域知识,以协助从大量文献中提取生物实体关系。在这种情况下,本文提出了 BioGraphSAGE 模型,这是一种带有结构化数据库的孪生图神经网络,可从文献中提取生物实体关系。我们的模型结合了生物语义特征和位置特征,以提高同一文献中远距离实体之间关系的识别能力。实验结果表明,在较小的标注样本上,BioGraphSAGE 在其他关系提取模型中取得了最佳的 F1 得分。此外,即使不使用标注训练样本,该模型仍能保持 0.526 的 F1 得分。