生物共指消解评分系统（Bio-SCoRes）：一种用于生物医学文本共指消解的混合架构

Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.

作者信息

Kilicoglu Halil, Demner-Fushman Dina

机构信息

Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

出版信息

PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.

DOI:10.1371/journal.pone.0148538

PMID:26934708

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4774913/

Abstract

Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledged. One of the difficulties in coreference resolution stems from the fact that distinct types of coreference (e.g., anaphora, appositive) are expressed with a variety of lexical and syntactic means (e.g., personal pronouns, definite noun phrases), and that resolution of each combination often requires a different approach. In the biomedical domain, it is common for coreference annotation and resolution efforts to focus on specific subcategories of coreference deemed important for the downstream task. In the current work, we aim to address some of these concerns regarding coreference resolution in biomedical text. We propose a general, modular framework underpinned by a smorgasbord architecture (Bio-SCoRes), which incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. For development and evaluation, we used a corpus of structured drug labels annotated with fine-grained coreference information. In addition, we evaluated our approach on two other corpora (i2b2/VA discharge summaries and protein coreference dataset) to investigate its generality and ease of adaptation to other biomedical text types. Our results demonstrate the usefulness of our novel smorgasbord architecture. The specific pipelines based on the architecture perform successfully in linking coreferential mention pairs, while we find that recognition of full mention clusters is more challenging. The corpus of structured drug labels (SPL) as well as the components of Bio-SCoRes and some of the pipelines based on it are publicly available at https://github.com/kilicogluh/Bio-SCoRes. We believe that Bio-SCoRes can serve as a strong and extensible baseline system for coreference resolution of biomedical text.

摘要

指代消解是自然语言处理中一项基本且具有挑战性的任务。成功解决指代问题对下游自然语言处理任务（如信息提取和问答）会产生显著的积极影响。指代消解在生物医学文本分析应用中的重要性已日益得到认可。指代消解的困难之一在于，不同类型的指代（如回指、同位语）通过多种词汇和句法手段（如人称代词、限定名词短语）来表达，而且每种组合的消解通常需要不同的方法。在生物医学领域，指代标注和消解工作通常聚焦于对下游任务而言重要的特定指代子类别。在当前工作中，我们旨在解决生物医学文本中与指代消解相关的一些问题。我们提出了一个通用的模块化框架，该框架以杂烩式架构（Bio - SCoRes）为基础，它纳入了多种指代类型及其提及内容，并允许对消解策略进行细粒度的指定，以解决不同指代类型 - 提及对的指代问题。为了进行开发和评估，我们使用了一个标注有细粒度指代信息的结构化药品标签语料库。此外，我们在另外两个语料库（i2b2/VA出院小结和蛋白质指代数据集）上评估了我们的方法，以研究其通用性以及对其他生物医学文本类型的适应难易程度。我们的结果证明了我们新颖的杂烩式架构的有用性。基于该架构的特定管道在链接指代提及对方面表现成功，而我们发现识别完整的提及簇更具挑战性。结构化药品标签（SPL）语料库以及Bio - SCoRes的组件和一些基于它的管道可在https://github.com/kilicogluh/Bio - SCoRes上公开获取。我们相信Bio - SCoRes可以作为生物医学文本指代消解的一个强大且可扩展的基线系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7dc/4774913/510fdc642936/pone.0148538.g001.jpg

相似文献

Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.生物共指消解评分系统（Bio-SCoRes）：一种用于生物医学文本共指消解的混合架构

PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.

A categorical analysis of coreference resolution errors in biomedical texts.生物医学文本中指代消解错误的分类分析。

J Biomed Inform. 2016 Apr;60:309-18. doi: 10.1016/j.jbi.2016.02.015. Epub 2016 Feb 27.

Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文（CRAFT）语料库中的共指标注与消解

BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.

Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives.利用领域知识和领域启发的语篇模型解决临床叙述中的共指消解问题。

J Am Med Inform Assoc. 2013 Mar-Apr;20(2):356-62. doi: 10.1136/amiajnl-2011-000767. Epub 2012 Jul 10.

Distinguished representation of identical mentions in bio-entity coreference resolution.生物实体共指消解中相同提及的出色表示。

BMC Med Inform Decis Mak. 2022 Apr 30;22(1):116. doi: 10.1186/s12911-022-01862-1.

A supervised framework for resolving coreference in clinical records.一种用于解决临床记录中共指消解问题的有监督框架。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):875-82. doi: 10.1136/amiajnl-2012-000810. Epub 2012 May 19.

MCORES: a system for noun phrase coreference resolution for clinical records.MCORES：用于临床记录中名词短语共指消解的系统。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):906-12. doi: 10.1136/amiajnl-2011-000591. Epub 2012 Mar 14.

A classification approach to coreference in discharge summaries: 2011 i2b2 challenge.一种用于出院小结中核心参照的分类方法：2011 i2b2 挑战赛。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):897-905. doi: 10.1136/amiajnl-2011-000734. Epub 2012 Apr 13.

Evaluating the state of the art in coreference resolution for electronic medical records.评估电子病历中核心参考解析的最新技术水平。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):786-91. doi: 10.1136/amiajnl-2011-000784. Epub 2012 Feb 24.

Coreference resolution of medical concepts in discharge summaries by exploiting contextual information.利用上下文信息解决出院小结中医疗概念的共指消解问题。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):888-96. doi: 10.1136/amiajnl-2012-000808. Epub 2012 May 3.

引用本文的文献

Distinguished representation of identical mentions in bio-entity coreference resolution.生物实体共指消解中相同提及的出色表示。

BMC Med Inform Decis Mak. 2022 Apr 30;22(1):116. doi: 10.1186/s12911-022-01862-1.

Broad-coverage biomedical relation extraction with SemRep.基于 SemRep 的广谱生物医学关系抽取。

BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.

Automatic recognition of self-acknowledged limitations in clinical research literature.临床研究文献中自我承认局限性的自动识别。

J Am Med Inform Assoc. 2018 Jul 1;25(7):855-861. doi: 10.1093/jamia/ocy038.

Semantic annotation of consumer health questions.消费者健康问题的语义标注。

BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.

Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing.理解医疗保健领域的大文本数据：临床自然语言处理部分的研究结果。

Yearb Med Inform. 2017 Aug;26(1):228-234. doi: 10.15265/IY-2017-027. Epub 2017 Sep 11.

Assigning factuality values to semantic relations extracted from biomedical research literature.为从生物医学研究文献中提取的语义关系分配事实性值。

PLoS One. 2017 Jul 5;12(7):e0179926. doi: 10.1371/journal.pone.0179926. eCollection 2017.

本文引用的文献

Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation.预测提及的共指分区评分：参考实现。

Proc Conf Assoc Comput Linguist Meet. 2014 Jun;2014:30-35. doi: 10.3115/v1/P14-2006.

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities.共指消解对细菌与生物栖息地实体之间监督关系检测的贡献。

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.

Improving protein coreference resolution by simple semantic classification.通过简单的语义分类提高蛋白质共指解析的准确性。

BMC Bioinformatics. 2012 Nov 17;13:304. doi: 10.1186/1471-2105-13-304.

A rule based solution to co-reference resolution in clinical text.基于规则的临床文本共指消解解决方案。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):891-7. doi: 10.1136/amiajnl-2011-000770. Epub 2012 Oct 11.

Biological event composition.生物事件组成。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S7. doi: 10.1186/1471-2105-13-S11-S7.

The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011.2011 年生物自然语言处理共享任务的 Genia 事件和蛋白质共指任务。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2105-13-S11-S1.

Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.临床记录中的共指分析：一种带有交替回指解析模块的多遍筛选方法。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):867-74. doi: 10.1136/amiajnl-2011-000766. Epub 2012 Jun 16.

A supervised framework for resolving coreference in clinical records.一种用于解决临床记录中共指消解问题的有监督框架。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):875-82. doi: 10.1136/amiajnl-2012-000810. Epub 2012 May 19.

Boosting automatic event extraction from the literature using domain adaptation and coreference resolution.利用领域自适应和共指解析技术提高文献中自动事件抽取的性能。

Bioinformatics. 2012 Jul 1;28(13):1759-65. doi: 10.1093/bioinformatics/bts237. Epub 2012 Apr 25.

A classification approach to coreference in discharge summaries: 2011 i2b2 challenge.一种用于出院小结中核心参照的分类方法：2011 i2b2 挑战赛。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):897-905. doi: 10.1136/amiajnl-2011-000734. Epub 2012 Apr 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

生物共指消解评分系统（Bio-SCoRes）：一种用于生物医学文本共指消解的混合架构

Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献