评估ChatGPT-4在识别FRCR第一部分考试题目中的放射解剖学方面的表现。

Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions.

作者信息

Sarangi Pradosh Kumar, Datta Suvrankar, Panda Braja Behari, Panda Swaha, Mondal Himel

机构信息

Department of Radiodiagnosis, All India Institute of Medical Sciences, Deoghar, Jharkhand, India.

Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India.

出版信息

Indian J Radiol Imaging. 2024 Nov 4;35(2):287-294. doi: 10.1055/s-0044-1792040. eCollection 2025 Apr.

DOI:10.1055/s-0044-1792040

PMID:40297110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12034419/

Abstract

Radiology is critical for diagnosis and patient care, relying heavily on accurate image interpretation. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have raised interest in the potential of AI models to support radiologists, although robust research on AI performance in this field is still emerging. This study aimed to assess the efficacy of ChatGPT-4 in answering radiological anatomy questions similar to those in the Fellowship of the Royal College of Radiologists (FRCR) Part 1 Anatomy examination. We used 100 mock radiological anatomy questions from a free Web site patterned after the FRCR Part 1 Anatomy examination. ChatGPT-4 was tested under two conditions: with and without context regarding the examination instructions and question format. The main query posed was: "Identify the structure indicated by the arrow(s)." Responses were evaluated against correct answers, and two expert radiologists (>5 and 30 years of experience in radiology diagnostics and academics) rated the explanation of the answers. We calculated four scores: correctness, sidedness, modality identification, and approximation. The latter considers partial correctness if the identified structure is present but not the focus of the question. Both testing conditions saw ChatGPT-4 underperform, with correctness scores of 4 and 7.5% for no context and with context, respectively. However, it identified the imaging modality with 100% accuracy. The model scored over 50% on the approximation metric, where it identified present structures not indicated by the arrow. However, it struggled with identifying the correct side of the structure, scoring approximately 42 and 40% in the no context and with context settings, respectively. Only 32% of the responses were similar across the two settings. Despite its ability to correctly recognize the imaging modality, ChatGPT-4 has significant limitations in interpreting normal radiological anatomy. This indicates the necessity for enhanced training in normal anatomy to better interpret abnormal radiological images. Identifying the correct side of structures in radiological images also remains a challenge for ChatGPT-4.

摘要

放射学对于诊断和患者护理至关重要，严重依赖于准确的图像解读。人工智能（AI）和自然语言处理（NLP）的最新进展引发了人们对AI模型支持放射科医生潜力的兴趣，尽管该领域关于AI性能的有力研究仍在不断涌现。

本研究旨在评估ChatGPT-4在回答与皇家放射学院（FRCR）第一部分解剖学考试类似的放射解剖学问题方面的效果。

我们使用了一个免费网站上仿照FRCR第一部分解剖学考试的100道模拟放射解剖学问题。ChatGPT-4在两种条件下进行测试：有无关于考试说明和问题格式的背景信息。主要提出的问题是：“识别箭头所指的结构。” 根据正确答案评估回答，并由两位放射学专家（在放射诊断和学术方面分别有超过5年和30年经验）对答案的解释进行评分。我们计算了四个分数：正确性、方位性、模态识别和近似度。后者考虑如果识别出的结构存在但不是问题的重点则为部分正确。

在两种测试条件下，ChatGPT-4的表现都不佳，无背景信息和有背景信息时的正确性得分分别为4%和7.5%。然而，它识别成像模态的准确率为100%。该模型在近似度指标上得分超过50%，即它识别出了箭头未指示的存在的结构。然而，它在识别结构的正确方位方面存在困难，在无背景信息和有背景信息的设置中分别得分约42%和40%。两种设置下只有32%的回答相似。

尽管ChatGPT-4能够正确识别成像模态，但在解读正常放射解剖学方面存在重大局限性。这表明有必要加强正常解剖学方面的训练，以便更好地解读异常放射图像。识别放射图像中结构的正确方位对ChatGPT-4来说仍然是一个挑战。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估ChatGPT-4在识别FRCR第一部分考试题目中的放射解剖学方面的表现。

Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

评估ChatGPT-4在识别FRCR第一部分考试题目中的放射解剖学方面的表现。

Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献