迈向可泛化的以实体为中心的临床共指消解

Towards generalizable entity-centric clinical coreference resolution.

作者信息

Miller Timothy, Dligach Dmitriy, Bethard Steven, Lin Chen, Savova Guergana

机构信息

Boston Children's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States.

Loyola University Chicago, Chicago, IL, United States.

出版信息

J Biomed Inform. 2017 May;69:251-258. doi: 10.1016/j.jbi.2017.04.015. Epub 2017 Apr 21.

DOI:10.1016/j.jbi.2017.04.015

PMID:28438706

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5508069/

Abstract

OBJECTIVE

This work investigates the problem of clinical coreference resolution in a model that explicitly tracks entities, and aims to measure the performance of that model in both traditional in-domain train/test splits and cross-domain experiments that measure the generalizability of learned models.

METHODS

The two methods we compare are a baseline mention-pair coreference system that operates over pairs of mentions with best-first conflict resolution and a mention-synchronous system that incrementally builds coreference chains. We develop new features that incorporate distributional semantics, discourse features, and entity attributes. We use two new coreference datasets with similar annotation guidelines - the THYME colon cancer dataset and the DeepPhe breast cancer dataset.

RESULTS

The mention-synchronous system performs similarly on in-domain data but performs much better on new data. Part of speech tag features prove superior in feature generalizability experiments over other word representations. Our methods show generalization improvement but there is still a performance gap when testing in new domains.

DISCUSSION

Generalizability of clinical NLP systems is important and under-studied, so future work should attempt to perform cross-domain and cross-institution evaluations and explicitly develop features and training regimens that favor generalizability. A performance-optimized version of the mention-synchronous system will be included in the open source Apache cTAKES software.

摘要

目的

本研究探讨了在一个明确跟踪实体的模型中临床共指消解的问题，旨在衡量该模型在传统的领域内训练/测试分割以及衡量学习模型泛化能力的跨域实验中的性能。

方法

我们比较的两种方法，一种是基线提及对共指系统，它通过具有最佳优先冲突消解的提及对进行操作；另一种是提及同步系统，它逐步构建共指链。我们开发了结合分布语义、篇章特征和实体属性的新特征。我们使用了两个具有相似注释指南的新共指数据集——THYME结肠癌数据集和DeepPhe乳腺癌数据集。

结果

提及同步系统在领域内数据上表现相似，但在新数据上表现要好得多。词性标注特征在特征泛化实验中被证明优于其他词表示。我们的方法显示出泛化能力有所提高，但在新领域进行测试时仍存在性能差距。

讨论

临床自然语言处理系统的泛化能力很重要且研究不足，因此未来的工作应尝试进行跨域和跨机构评估，并明确开发有利于泛化的特征和训练方案。提及同步系统的性能优化版本将包含在开源的Apache cTAKES软件中。

相似文献

Towards generalizable entity-centric clinical coreference resolution.

J Biomed Inform. 2017 May;69:251-258. doi: 10.1016/j.jbi.2017.04.015. Epub 2017 Apr 21.

Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives.

J Am Med Inform Assoc. 2013 Mar-Apr;20(2):356-62. doi: 10.1136/amiajnl-2011-000767. Epub 2012 Jul 10.

Distinguished representation of identical mentions in bio-entity coreference resolution.

BMC Med Inform Decis Mak. 2022 Apr 30;22(1):116. doi: 10.1186/s12911-022-01862-1.

Lexical patterns, features and knowledge resources for coreference resolution in clinical notes.

J Biomed Inform. 2012 Oct;45(5):901-12. doi: 10.1016/j.jbi.2012.02.012. Epub 2012 Mar 17.

A categorical analysis of coreference resolution errors in biomedical texts.

J Biomed Inform. 2016 Apr;60:309-18. doi: 10.1016/j.jbi.2016.02.015. Epub 2016 Feb 27.

A supervised framework for resolving coreference in clinical records.

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):875-82. doi: 10.1136/amiajnl-2012-000810. Epub 2012 May 19.

Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.

PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.

Minimalistic Approach to Coreference Resolution in Lithuanian Medical Records.

Comput Math Methods Med. 2019 Mar 20;2019:9079840. doi: 10.1155/2019/9079840. eCollection 2019.

Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.

EUSKOR: End-to-end coreference resolution system for Basque.

PLoS One. 2019 Sep 12;14(9):e0221801. doi: 10.1371/journal.pone.0221801. eCollection 2019.

引用本文的文献

Cumulus: a federated electronic health record-based learning system powered by Fast Healthcare Interoperability Resources and artificial intelligence.

J Am Med Inform Assoc. 2024 Aug 1;31(8):1638-1647. doi: 10.1093/jamia/ocae130.

Rethinking domain adaptation for machine learning over clinical language.

JAMIA Open. 2020 Apr 13;3(2):146-150. doi: 10.1093/jamiaopen/ooaa010. eCollection 2020 Jul.

Interactive Exploration of Longitudinal Cancer Patient Histories Extracted From Clinical Text.

JCO Clin Cancer Inform. 2020 May;4:412-420. doi: 10.1200/CCI.19.00115.

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.

JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.

本文引用的文献

Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation.

Proc Conf Assoc Comput Linguist Meet. 2014 Jun;2014:30-35. doi: 10.3115/v1/P14-2006.

Temporal Annotation in the Clinical Domain.

Trans Assoc Comput Linguist. 2014 Apr;2:143-154.

Negation's not solved: generalizability versus optimizability in clinical natural language processing.

PLoS One. 2014 Nov 13;9(11):e112774. doi: 10.1371/journal.pone.0112774. eCollection 2014.

Discovering body site and severity modifiers in clinical texts.

J Am Med Inform Assoc. 2014 May-Jun;21(3):448-54. doi: 10.1136/amiajnl-2013-001766. Epub 2013 Oct 3.

Towards comprehensive syntactic and semantic annotations of the clinical narrative.

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.

Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):867-74. doi: 10.1136/amiajnl-2011-000766. Epub 2012 Jun 16.

A supervised framework for resolving coreference in clinical records.

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):875-82. doi: 10.1136/amiajnl-2012-000810. Epub 2012 May 19.

A classification approach to coreference in discharge summaries: 2011 i2b2 challenge.

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):897-905. doi: 10.1136/amiajnl-2011-000734. Epub 2012 Apr 13.

Evaluating the state of the art in coreference resolution for electronic medical records.

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):786-91. doi: 10.1136/amiajnl-2011-000784. Epub 2012 Feb 24.

A system for coreference resolution for the clinical narrative.

J Am Med Inform Assoc. 2012 Jul-Aug;19(4):660-7. doi: 10.1136/amiajnl-2011-000599. Epub 2012 Jan 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向可泛化的以实体为中心的临床共指消解

Towards generalizable entity-centric clinical coreference resolution.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

DISCUSSION

目的

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献