Fiszman M, Chapman W W, Aronsky D, Evans R S, Haug P J
The University of Utah, Salt Lake City, Utah, USA.
J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604. doi: 10.1136/jamia.2000.0070593.
To evaluate the performance of a natural language processing system in extracting pneumonia-related concepts from chest x-ray reports.
Four physicians, three lay persons, a natural language processing system, and two keyword searches (designated AAKS and KS) detected the presence or absence of three pneumonia-related concepts and inferred the presence or absence of acute bacterial pneumonia from 292 chest x-ray reports. Gold standard: Majority vote of three independent physicians. Reliability of the gold standard was measured.
Recall, precision, specificity, and agreement (using Finn's R: statistic) with respect to the gold standard. Differences between the physicians and the other subjects were tested using the McNemar test for each pneumonia concept and for the disease inference of acute bacterial pneumonia.
Reliability of the reference standard ranged from 0.86 to 0.96. Recall, precision, specificity, and agreement (Finn R:) for the inference on acute bacterial pneumonia were, respectively, 0.94, 0.87, 0.91, and 0.84 for physicians; 0.95, 0.78, 0.85, and 0.75 for natural language processing system; 0.46, 0.89, 0.95, and 0.54 for lay persons; 0.79, 0.63, 0.71, and 0.49 for AAKS; and 0.87, 0.70, 0.77, and 0.62 for KS. The McNemar pairwise comparisons showed differences between one physician and the natural language processing system for the infiltrate concept and between another physician and the natural language processing system for the inference on acute bacterial pneumonia. The comparisons also showed that most physicians were significantly different from the other subjects in all pneumonia concepts and the disease inference.
In extracting pneumonia related concepts from chest x-ray reports, the performance of the natural language processing system was similar to that of physicians and better than that of lay persons and keyword searches. The encoded pneumonia information has the potential to support several pneumonia-related applications used in our institution. The applications include a decision support system called the antibiotic assistant, a computerized clinical protocol for pneumonia, and a quality assurance application in the radiology department.
评估一个自然语言处理系统从胸部X光报告中提取肺炎相关概念的性能。
四位医生、三位非专业人员、一个自然语言处理系统以及两次关键词搜索(分别指定为AAKS和KS),从292份胸部X光报告中检测三个肺炎相关概念的存在与否,并推断急性细菌性肺炎的存在与否。金标准:三位独立医生的多数投票。测量了金标准的可靠性。
关于金标准的召回率、精确率、特异性和一致性(使用Finn's R统计量)。针对每个肺炎概念以及急性细菌性肺炎的疾病推断,使用McNemar检验来测试医生与其他受试者之间的差异。
参考标准的可靠性范围为0.86至0.96。对于急性细菌性肺炎推断的召回率、精确率、特异性和一致性(Finn R),医生分别为0.94、0.87、0.91和0.84;自然语言处理系统分别为0.95、0.78、0.85和0.75;非专业人员分别为0.46、0.89、0.95和0.54;AAKS分别为0.79、0.63、0.71和0.49;KS分别为0.87、0.70、0.77和0.62。McNemar成对比较显示,在浸润概念方面,一位医生与自然语言处理系统之间存在差异;在急性细菌性肺炎推断方面,另一位医生与自然语言处理系统之间存在差异。比较还表明,在所有肺炎概念和疾病推断方面,大多数医生与其他受试者存在显著差异。
在从胸部X光报告中提取肺炎相关概念时,自然语言处理系统的性能与医生相似,且优于非专业人员和关键词搜索。编码后的肺炎信息有可能支持我们机构中使用的多种肺炎相关应用。这些应用包括一个名为抗生素助手的决策支持系统、一个肺炎的计算机化临床方案以及放射科的质量保证应用。