Savova Guergana K, Fan Jin, Ye Zi, Murphy Sean P, Zheng Jiaping, Chute Christopher G, Kullo Iftikhar J
Division of Biomedical Statistics and Informatics.
AMIA Annu Symp Proc. 2010 Nov 13;2010:722-6.
As part of the Electronic Medical Records and Genomics Network, we applied, extended and evaluated an open source clinical Natural Language Processing system, Mayo's Clinical Text Analysis and Knowledge Extraction System, for the discovery of peripheral arterial disease cases from radiology reports. The manually created gold standard consisted of 223 positive, 19 negative, 63 probable and 150 unknown cases. Overall accuracy agreement between the system and the gold standard was 0.93 as compared to a named entity recognition baseline of 0.46. Sensitivity for the positive, probable and unknown cases was 0.93-0.96, and for the negative cases was 0.72. Specificity and negative predictive value for all categories were in the 90's. The positive predictive value for the positive and unknown categories was in the high 90's, for the negative category was 0.84, and for the probable category was 0.63. We outline the main sources of errors and suggest improvements.
作为电子病历与基因组学网络的一部分,我们应用、扩展并评估了一个开源临床自然语言处理系统——梅奥临床文本分析与知识提取系统,用于从放射学报告中发现外周动脉疾病病例。人工创建的金标准包括223例阳性、19例阴性、63例可能病例和150例未知病例。与命名实体识别基线的0.46相比,该系统与金标准之间的总体准确性一致性为0.93。阳性、可能和未知病例的敏感性为0.93 - 0.96,阴性病例的敏感性为0.72。所有类别的特异性和阴性预测值均在90%以上。阳性和未知类别的阳性预测值在90%以上,阴性类别的阳性预测值为0.84,可能类别的阳性预测值为0.63。我们概述了主要误差来源并提出了改进建议。