Suppr超能文献

利用序列标注框架从生物医学文本中提取文档级关系。

Exploiting sequence labeling framework to extract document-level relations from biomedical texts.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, 77030, USA.

出版信息

BMC Bioinformatics. 2020 Mar 27;21(1):125. doi: 10.1186/s12859-020-3457-2.

Abstract

BACKGROUND

Both intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research. However, most existing methods either focus on extracting intra-sentential relations and ignore inter-sentential ones or fail to extract inter-sentential relations accurately and regard the instances containing entity relations as being independent, which neglects the interactions between relations. We propose a novel sequence labeling-based biomedical relation extraction method named Bio-Seq. In the method, sequence labeling framework is extended by multiple specified feature extractors so as to facilitate the feature extractions at different levels, especially at the inter-sentential level. Besides, the sequence labeling framework enables Bio-Seq to take advantage of the interactions between relations, and thus, further improves the precision of document-level relation extraction.

RESULTS

Our proposed method obtained an F1-score of 63.5% on BioCreative V chemical disease relation corpus, and an F1-score of 54.4% on inter-sentential relations, which was 10.5% better than the document-level classification baseline. Also, our method achieved an F1-score of 85.1% on n2c2-ADE sub-dataset.

CONCLUSION

Sequence labeling method can be successfully used to extract document-level relations, especially for boosting the performance on inter-sentential relation extraction. Our work can facilitate the research on document-level biomedical text mining.

摘要

背景

生物医学文本中的句内和句间语义关系都为生物医学研究提供了有价值的信息。然而,大多数现有的方法要么专注于提取句内关系而忽略句间关系,要么无法准确提取句间关系,并将包含实体关系的实例视为独立的,从而忽略了关系之间的相互作用。我们提出了一种名为 Bio-Seq 的基于序列标注的生物医学关系抽取新方法。在该方法中,通过多个指定的特征提取器扩展了序列标注框架,以便于在不同层次(特别是句间层次)进行特征提取。此外,序列标注框架使 Bio-Seq 能够利用关系之间的相互作用,从而进一步提高文档级关系抽取的精度。

结果

我们提出的方法在 BioCreative V 化学疾病关系语料库上的 F1 得分为 63.5%,在句间关系上的 F1 得分为 54.4%,比文档级分类基线高 10.5%。此外,我们的方法在 n2c2-ADE 子数据集上的 F1 得分为 85.1%。

结论

序列标注方法可成功用于提取文档级关系,特别是可提高句间关系提取的性能。我们的工作可以促进文档级生物医学文本挖掘的研究。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验