Suppr超能文献

临床叙述中的回指关系:语料库创建。

Anaphoric relations in the clinical narrative: corpus creation.

机构信息

Children's Hospital Boston Informatics Program and Harvard Medical School, Boston, Massachusetts 02114, USA.

出版信息

J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.

Abstract

OBJECTIVE

The long-term goal of this work is the automated discovery of anaphoric relations from the clinical narrative. The creation of a gold standard set from a cross-institutional corpus of clinical notes and high-level characteristics of that gold standard are described.

METHODS

A standard methodology for annotation guideline development, gold standard annotations, and inter-annotator agreement (IAA) was used.

RESULTS

The gold standard annotations resulted in 7214 markables, 5992 pairs, and 1304 chains. Each report averaged 40 anaphoric markables, 33 pairs, and seven chains. The overall IAA is high on the Mayo dataset (0.6607), and moderate on the University of Pittsburgh Medical Center (UPMC) dataset (0.4072). The IAA between each annotator and the gold standard is high (Mayo: 0.7669, 0.7697, and 0.9021; UPMC: 0.6753 and 0.7138). These results imply a quality corpus feasible for system development. They also suggest the complementary nature of the annotations performed by the experts and the importance of an annotator team with diverse knowledge backgrounds.

LIMITATIONS

Only one of the annotators had the linguistic background necessary for annotation of the linguistic attributes. The overall generalizability of the guidelines will be further strengthened by annotations of data from additional sites. This will increase the overall corpus size and the representation of each relation type.

CONCLUSION

The first step toward the development of an anaphoric relation resolver as part of a comprehensive natural language processing system geared specifically for the clinical narrative in the electronic medical record is described. The deidentified annotated corpus will be available to researchers.

摘要

目的

这项工作的长期目标是从临床叙述中自动发现回指关系。描述了从跨机构临床笔记语料库创建黄金标准集以及该黄金标准的高级别特征。

方法

使用了一种标准的注释指南开发、黄金标准注释和注释者间一致性(IAA)方法。

结果

黄金标准注释产生了 7214 个可标记项、5992 对和 1304 个链。每份报告平均有 40 个回指可标记项、33 对和 7 个链。 Mayo 数据集的整体 IAA 较高(0.6607),而匹兹堡大学医学中心(UPMC)数据集的 IAA 适中(0.4072)。每个注释者与黄金标准之间的 IAA 较高(Mayo:0.7669、0.7697 和 0.9021;UPMC:0.6753 和 0.7138)。这些结果表明该语料库质量较高,适合系统开发。它们还表明,专家注释具有互补性,并且具有不同知识背景的注释者团队很重要。

局限性

只有一位注释者具有进行语言属性注释所需的语言学背景。通过对来自其他站点的数据进行注释,将进一步加强指南的总体概括性。这将增加整体语料库规模和每种关系类型的代表性。

结论

描述了作为专门针对电子病历中的临床叙述的全面自然语言处理系统的一部分开发回指关系解析器的第一步。将提供经过身份识别的注释语料库供研究人员使用。

相似文献

1
Anaphoric relations in the clinical narrative: corpus creation.临床叙述中的回指关系:语料库创建。
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):459-65. doi: 10.1136/amiajnl-2011-000108. Epub 2011 Apr 1.
6
Towards comprehensive syntactic and semantic annotations of the clinical narrative.朝着临床叙述的全面句法和语义标注努力。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.
10
Annotated chemical patent corpus: a gold standard for text mining.带注释的化学专利语料库:文本挖掘的黄金标准。
PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014.

引用本文的文献

2
Semantic annotation of consumer health questions.消费者健康问题的语义标注。
BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
5
Towards generalizable entity-centric clinical coreference resolution.迈向可泛化的以实体为中心的临床共指消解
J Biomed Inform. 2017 May;69:251-258. doi: 10.1016/j.jbi.2017.04.015. Epub 2017 Apr 21.
7
"Big data" and the electronic health record.“大数据”与电子健康记录
Yearb Med Inform. 2014 Aug 15;9(1):97-104. doi: 10.15265/IY-2014-0003.
9
Recent trends in biomedical informatics: a study based on JAMIA articles.生物医学信息学的最新趋势:基于 JAMIA 文章的研究。
J Am Med Inform Assoc. 2013 Dec;20(e2):e198-205. doi: 10.1136/amiajnl-2013-002429. Epub 2013 Nov 8.

本文引用的文献

3
Building a semantically annotated corpus of clinical texts.构建临床文本语义标注语料库。
J Biomed Inform. 2009 Oct;42(5):950-66. doi: 10.1016/j.jbi.2008.12.013. Epub 2009 Jan 23.
6
Agreement, the f-measure, and reliability in information retrieval.信息检索中的一致性、F值与可靠性。
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8. doi: 10.1197/jamia.M1733. Epub 2005 Jan 31.
7
Exploring semantic groups through visual approaches.通过视觉方法探索语义群组。
J Biomed Inform. 2003 Dec;36(6):414-32. doi: 10.1016/j.jbi.2003.11.002.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验