Suppr超能文献

评估自动胸部X光放射学报告生成的进展。

Evaluating progress in automatic chest X-ray radiology report generation.

作者信息

Yu Feiyang, Endo Mark, Krishnan Rayan, Pan Ian, Tsai Andy, Reis Eduardo Pontes, Fonseca Eduardo Kaiser Ururahy Nunes, Lee Henrique Min Ho, Abad Zahra Shakeri Hossein, Ng Andrew Y, Langlotz Curtis P, Venugopal Vasantha Kumar, Rajpurkar Pranav

机构信息

Department of Computer Science, Stanford University, Stanford, CA 94305, USA.

Department of Radiology, Brigham and Women's Hospital, Boston, MA 02115, USA.

出版信息

Patterns (N Y). 2023 Aug 3;4(9):100802. doi: 10.1016/j.patter.2023.100802. eCollection 2023 Sep 8.

Abstract

Artificial intelligence (AI) models for automatic generation of narrative radiology reports from images have the potential to enhance efficiency and reduce the workload of radiologists. However, evaluating the correctness of these reports requires metrics that can capture clinically pertinent differences. In this study, we investigate the alignment between automated metrics and radiologists' scoring of errors in report generation. We address the limitations of existing metrics by proposing new metrics, RadGraph F1 and RadCliQ, which demonstrate stronger correlation with radiologists' evaluations. In addition, we analyze the failure modes of the metrics to understand their limitations and provide guidance for metric selection and interpretation. This study establishes RadGraph F1 and RadCliQ as meaningful metrics for guiding future research in radiology report generation.

摘要

用于根据图像自动生成叙述性放射学报告的人工智能(AI)模型有提高效率和减轻放射科医生工作量的潜力。然而,评估这些报告的正确性需要能够捕捉临床相关差异的指标。在本研究中,我们调查了自动指标与放射科医生对报告生成错误的评分之间的一致性。我们通过提出新的指标RadGraph F1和RadCliQ来解决现有指标的局限性,这两个指标与放射科医生的评估显示出更强的相关性。此外,我们分析了指标的失败模式以了解其局限性,并为指标的选择和解释提供指导。本研究将RadGraph F1和RadCliQ确立为指导未来放射学报告生成研究的有意义的指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6a5/10499844/cebf43eb7b3b/fx1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验