Suppr超能文献

一项用于评估从放射学报告中提取信息的可靠性研究。

A reliability study for evaluating information extraction from radiology reports.

作者信息

Hripcsak G, Kuperman G J, Friedman C, Heitjan D F

机构信息

Columbia University, New York, New York, USA.

出版信息

J Am Med Inform Assoc. 1999 Mar-Apr;6(2):143-50. doi: 10.1136/jamia.1999.0060143.

Abstract

GOAL

To assess the reliability of a reference standard for an information extraction task.

SETTING

Twenty-four physician raters from two sites and two specialties judged whether clinical conditions were present based on reading chest radiograph reports.

METHODS

Variance components, generalizability (reliability) coefficients, and the number of expert raters needed to generate a reliable reference standard were estimated.

RESULTS

Per-rater reliability averaged across conditions was 0.80 (95% CI, 0.79-0.81). Reliability for the nine individual conditions varied from 0.67 to 0.97, with central line presence and pneumothorax the most reliable, and pleural effusion (excluding CHF) and pneumonia the least reliable. One to two raters were needed to achieve a reliability of 0.70, and six raters, on average, were required to achieve a reliability of 0.95. This was far more reliable than a previously published per-rater reliability of 0.19 for a more complex task. Differences between sites were attributable to changes to the condition definitions.

CONCLUSION

In these evaluations, physician raters were able to judge very reliably the presence of clinical conditions based on text reports. Once the reliability of a specific rater is confirmed, it would be possible for that rater to create a reference standard reliable enough to assess aggregate measures on a system. Six raters would be needed to create a reference standard sufficient to assess a system on a case-by-case basis. These results should help evaluators design future information extraction studies for natural language processors and other knowledge-based systems.

摘要

目标

评估信息提取任务中参考标准的可靠性。

背景

来自两个地点和两个专业的24名医生评分者根据阅读胸部X光片报告判断是否存在临床病症。

方法

估计方差成分、泛化(可靠性)系数以及生成可靠参考标准所需的专家评分者数量。

结果

各条件下评分者的平均可靠性为0.80(95%可信区间,0.79 - 0.81)。九种个体病症的可靠性从0.67到0.97不等,中心静脉置管存在和气胸最为可靠,胸腔积液(不包括心力衰竭)和肺炎最不可靠。需要一到两名评分者才能达到0.70的可靠性,平均需要六名评分者才能达到0.95的可靠性。这比之前发表的一项更复杂任务中评分者的可靠性0.19要可靠得多。不同地点之间的差异归因于病症定义的变化。

结论

在这些评估中,医生评分者能够根据文本报告非常可靠地判断临床病症的存在。一旦确认了特定评分者的可靠性,该评分者就有可能创建一个足够可靠的参考标准,以评估系统的总体指标。需要六名评分者来创建一个足以逐案评估系统的参考标准。这些结果应有助于评估者为自然语言处理器和其他基于知识的系统设计未来的信息提取研究。

相似文献

2
Impact of clinical history on chest radiograph interpretation.临床病史对胸部 X 线片解读的影响。
J Hosp Med. 2013 Jul;8(7):359-64. doi: 10.1002/jhm.1991. Epub 2012 Nov 26.

引用本文的文献

2
Clinical report classification using Natural Language Processing and Topic Modeling.使用自然语言处理和主题建模的临床报告分类
Proc Int Conf Mach Learn Appl. 2012 Dec;2012:204-209. doi: 10.1109/icmla.2012.173. Epub 2013 Jan 10.

本文引用的文献

8
Performance of four computer-based diagnostic systems.四种基于计算机的诊断系统的性能。
N Engl J Med. 1994 Jun 23;330(25):1792-6. doi: 10.1056/NEJM199406233302506.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验