Roberts Angus, Gaizauskas Robert, Hepple Mark, Davis Neil, Demetriou George, Guo Yikun, Kola Jay, Roberts Ian, Setzer Andrea, Tapuria Archana, Wheeldin Bill
Natural Language Processing Group, University of Sheffield, UK.
AMIA Annu Symp Proc. 2007 Oct 11;2007:625-9.
The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations. CLEF uses Information Extraction (IE) to make this unstructured information available. An important part of IE is the identification of semantic entities and relationships. Typical approaches require human annotated documents to provide both evaluation standards and material for system development. CLEF has a corpus of clinical narratives, histopathology reports and imaging reports from 20 thousand patients. We describe the selection of a subset of this corpus for manual annotation of clinical entities and relationships. We describe an annotation methodology and report encouraging initial results of inter-annotator agreement. Comparisons are made between different text sub-genres, and between annotators with different skills.
临床电子科学框架(CLEF)项目正在构建一个用于临床信息捕获、整合和呈现的框架,以支持临床研究、循证医疗以及基因型与表型信息学。即便在精通电子健康记录(EHR)的机构中,此类框架所需的大量信息最初也是以文本形式存在的。CLEF利用信息抽取(IE)来获取这些非结构化信息。信息抽取的一个重要部分是语义实体和关系的识别。典型方法需要人工标注文档来提供评估标准和系统开发材料。CLEF拥有来自2万名患者的临床叙述、组织病理学报告和影像报告的语料库。我们描述了从该语料库中选择一个子集用于临床实体和关系的人工标注过程。我们介绍了一种标注方法,并报告了标注者间一致性的初步可喜结果。我们还对不同的文本子类型以及不同技能的标注者之间进行了比较。