CLEF语料库：临床文本的语义标注

The CLEF corpus: semantic annotation of clinical text.

作者信息

Roberts Angus, Gaizauskas Robert, Hepple Mark, Davis Neil, Demetriou George, Guo Yikun, Kola Jay, Roberts Ian, Setzer Andrea, Tapuria Archana, Wheeldin Bill

机构信息

Natural Language Processing Group, University of Sheffield, UK.

出版信息

AMIA Annu Symp Proc. 2007 Oct 11;2007:625-9.

PMID:18693911

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2655900/

Abstract

The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations. CLEF uses Information Extraction (IE) to make this unstructured information available. An important part of IE is the identification of semantic entities and relationships. Typical approaches require human annotated documents to provide both evaluation standards and material for system development. CLEF has a corpus of clinical narratives, histopathology reports and imaging reports from 20 thousand patients. We describe the selection of a subset of this corpus for manual annotation of clinical entities and relationships. We describe an annotation methodology and report encouraging initial results of inter-annotator agreement. Comparisons are made between different text sub-genres, and between annotators with different skills.

摘要

临床电子科学框架（CLEF）项目正在构建一个用于临床信息捕获、整合和呈现的框架，以支持临床研究、循证医疗以及基因型与表型信息学。即便在精通电子健康记录（EHR）的机构中，此类框架所需的大量信息最初也是以文本形式存在的。CLEF利用信息抽取（IE）来获取这些非结构化信息。信息抽取的一个重要部分是语义实体和关系的识别。典型方法需要人工标注文档来提供评估标准和系统开发材料。CLEF拥有来自2万名患者的临床叙述、组织病理学报告和影像报告的语料库。我们描述了从该语料库中选择一个子集用于临床实体和关系的人工标注过程。我们介绍了一种标注方法，并报告了标注者间一致性的初步可喜结果。我们还对不同的文本子类型以及不同技能的标注者之间进行了比较。

相似文献

The CLEF corpus: semantic annotation of clinical text.

AMIA Annu Symp Proc. 2007 Oct 11;2007:625-9.

Mining clinical relationships from patient narratives.

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-9-S11-S3.

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.

BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.

A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.

J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.

Building a semantically annotated corpus of clinical texts.

J Biomed Inform. 2009 Oct;42(5):950-66. doi: 10.1016/j.jbi.2008.12.013. Epub 2009 Jan 23.

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.

J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.

Semantator: semantic annotator for converting biomedical text to linked data.

J Biomed Inform. 2013 Oct;46(5):882-93. doi: 10.1016/j.jbi.2013.07.003. Epub 2013 Jul 15.

Parsing error correction of medical phrases for semantic annotation of clinical radiology reports.

AMIA Annu Symp Proc. 2008 Nov 6:1070.

Enriching a biomedical event corpus with meta-knowledge annotation.

BMC Bioinformatics. 2011 Oct 10;12:393. doi: 10.1186/1471-2105-12-393.

A comparison of word embeddings for the biomedical natural language processing.

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

引用本文的文献

Annotation of epilepsy clinic letters for natural language processing.

J Biomed Semantics. 2024 Sep 15;15(1):17. doi: 10.1186/s13326-024-00316-z.

Text mining for disease surveillance in veterinary clinical data: part one, the language of veterinary clinical records and searching for words.

Front Vet Sci. 2024 Jan 23;11:1352239. doi: 10.3389/fvets.2024.1352239. eCollection 2024.

The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings.

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;5:189-200. doi: 10.5220/0010903300003123.

Reducing Physicians' Cognitive Load During Chart Review: A Problem-Oriented Summary of the Patient Electronic Record.

AMIA Annu Symp Proc. 2022 Feb 21;2021:763-772. eCollection 2021.

TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;2022:162-169. doi: 10.5220/0010876100003123.

The OpenDeID corpus for patient de-identification.

Sci Rep. 2021 Oct 7;11(1):19973. doi: 10.1038/s41598-021-99554-9.

Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation.

JAMIA Open. 2020 Oct 14;3(3):431-438. doi: 10.1093/jamiaopen/ooaa029. eCollection 2020 Oct.

Automated Smart Home Assessment to Support Pain Management: Multiple Methods Analysis.

J Med Internet Res. 2020 Nov 6;22(11):e23943. doi: 10.2196/23943.

Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.

JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.

Constructing a Chinese electronic medical record corpus for named entity recognition on resident admit notes.

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):56. doi: 10.1186/s12911-019-0759-2.

本文引用的文献

Building and evaluating annotated corpora for medical NLP systems.

AMIA Annu Symp Proc. 2006;2006:1050.

Agreement, the f-measure, and reliability in information retrieval.

J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8. doi: 10.1197/jamia.M1733. Epub 2005 Jan 31.

GENIA corpus--semantically annotated corpus for bio-textmining.

Bioinformatics. 2003;19 Suppl 1:i180-2. doi: 10.1093/bioinformatics/btg1023.

Natural language processing and the representation of clinical data.

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60. doi: 10.1136/jamia.1994.95236145.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CLEF语料库：临床文本的语义标注

The CLEF corpus: semantic annotation of clinical text.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献