Liu Kaihong, Mitchell Kevin J, Chapman Wendy W, Crowley Rebecca S
Center for Biomedical Informatics, University of Pittsburgh, PA, USA.
AMIA Annu Symp Proc. 2005;2005:460-4.
Surgical pathology specimens are an important resource for medical research, particularly for cancer research. Although research studies would benefit from information derived from the surgical pathology reports, access to this information is limited by use of unstructured free-text in the reports. We have previously described a pipeline-based system for automated annotation of surgical pathology reports with UMLS concepts, which has been used to code over 450,000 surgical pathology reports at our institution. In addition to coding UMLS terms, it annotates values of several key variables, such as TNM stage and cancer grade. The object of this study was to evaluate the potential and limitations of automated extraction of these variables, by measuring the performance of the system against a true gold standard - manually encoded data entered by expert tissue annotators. We categorized and analyzed errors to determine the potential and limitations of information extraction from pathology reports for the purpose of automated biospecimen annotation.
手术病理标本是医学研究尤其是癌症研究的重要资源。尽管研究可从手术病理报告中获取的信息中受益,但报告中使用的非结构化自由文本限制了对这些信息的获取。我们之前描述了一种基于管道的系统,用于使用统一医学语言系统(UMLS)概念对手术病理报告进行自动注释,该系统已在我们机构用于对超过450,000份手术病理报告进行编码。除了对UMLS术语进行编码外,它还注释几个关键变量的值,如TNM分期和癌症分级。本研究的目的是通过将系统性能与真正的金标准——由专家组织注释员手动编码的数据进行比较,评估自动提取这些变量的潜力和局限性。我们对错误进行分类和分析,以确定从病理报告中提取信息用于自动生物标本注释的潜力和局限性。