School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, 77030, USA.
BMC Bioinformatics. 2020 Mar 27;21(1):125. doi: 10.1186/s12859-020-3457-2.
Both intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research. However, most existing methods either focus on extracting intra-sentential relations and ignore inter-sentential ones or fail to extract inter-sentential relations accurately and regard the instances containing entity relations as being independent, which neglects the interactions between relations. We propose a novel sequence labeling-based biomedical relation extraction method named Bio-Seq. In the method, sequence labeling framework is extended by multiple specified feature extractors so as to facilitate the feature extractions at different levels, especially at the inter-sentential level. Besides, the sequence labeling framework enables Bio-Seq to take advantage of the interactions between relations, and thus, further improves the precision of document-level relation extraction.
Our proposed method obtained an F1-score of 63.5% on BioCreative V chemical disease relation corpus, and an F1-score of 54.4% on inter-sentential relations, which was 10.5% better than the document-level classification baseline. Also, our method achieved an F1-score of 85.1% on n2c2-ADE sub-dataset.
Sequence labeling method can be successfully used to extract document-level relations, especially for boosting the performance on inter-sentential relation extraction. Our work can facilitate the research on document-level biomedical text mining.
生物医学文本中的句内和句间语义关系都为生物医学研究提供了有价值的信息。然而,大多数现有的方法要么专注于提取句内关系而忽略句间关系,要么无法准确提取句间关系,并将包含实体关系的实例视为独立的,从而忽略了关系之间的相互作用。我们提出了一种名为 Bio-Seq 的基于序列标注的生物医学关系抽取新方法。在该方法中,通过多个指定的特征提取器扩展了序列标注框架,以便于在不同层次(特别是句间层次)进行特征提取。此外,序列标注框架使 Bio-Seq 能够利用关系之间的相互作用,从而进一步提高文档级关系抽取的精度。
我们提出的方法在 BioCreative V 化学疾病关系语料库上的 F1 得分为 63.5%,在句间关系上的 F1 得分为 54.4%,比文档级分类基线高 10.5%。此外,我们的方法在 n2c2-ADE 子数据集上的 F1 得分为 85.1%。
序列标注方法可成功用于提取文档级关系,特别是可提高句间关系提取的性能。我们的工作可以促进文档级生物医学文本挖掘的研究。