• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估ChatGPT-4在识别FRCR第一部分考试题目中的放射解剖学方面的表现。

Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions.

作者信息

Sarangi Pradosh Kumar, Datta Suvrankar, Panda Braja Behari, Panda Swaha, Mondal Himel

机构信息

Department of Radiodiagnosis, All India Institute of Medical Sciences, Deoghar, Jharkhand, India.

Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India.

出版信息

Indian J Radiol Imaging. 2024 Nov 4;35(2):287-294. doi: 10.1055/s-0044-1792040. eCollection 2025 Apr.

DOI:10.1055/s-0044-1792040
PMID:40297110
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12034419/
Abstract

Radiology is critical for diagnosis and patient care, relying heavily on accurate image interpretation. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have raised interest in the potential of AI models to support radiologists, although robust research on AI performance in this field is still emerging.  This study aimed to assess the efficacy of ChatGPT-4 in answering radiological anatomy questions similar to those in the Fellowship of the Royal College of Radiologists (FRCR) Part 1 Anatomy examination.  We used 100 mock radiological anatomy questions from a free Web site patterned after the FRCR Part 1 Anatomy examination. ChatGPT-4 was tested under two conditions: with and without context regarding the examination instructions and question format. The main query posed was: "Identify the structure indicated by the arrow(s)." Responses were evaluated against correct answers, and two expert radiologists (>5 and 30 years of experience in radiology diagnostics and academics) rated the explanation of the answers. We calculated four scores: correctness, sidedness, modality identification, and approximation. The latter considers partial correctness if the identified structure is present but not the focus of the question.  Both testing conditions saw ChatGPT-4 underperform, with correctness scores of 4 and 7.5% for no context and with context, respectively. However, it identified the imaging modality with 100% accuracy. The model scored over 50% on the approximation metric, where it identified present structures not indicated by the arrow. However, it struggled with identifying the correct side of the structure, scoring approximately 42 and 40% in the no context and with context settings, respectively. Only 32% of the responses were similar across the two settings.  Despite its ability to correctly recognize the imaging modality, ChatGPT-4 has significant limitations in interpreting normal radiological anatomy. This indicates the necessity for enhanced training in normal anatomy to better interpret abnormal radiological images. Identifying the correct side of structures in radiological images also remains a challenge for ChatGPT-4.

摘要

放射学对于诊断和患者护理至关重要,严重依赖于准确的图像解读。人工智能(AI)和自然语言处理(NLP)的最新进展引发了人们对AI模型支持放射科医生潜力的兴趣,尽管该领域关于AI性能的有力研究仍在不断涌现。

本研究旨在评估ChatGPT-4在回答与皇家放射学院(FRCR)第一部分解剖学考试类似的放射解剖学问题方面的效果。

我们使用了一个免费网站上仿照FRCR第一部分解剖学考试的100道模拟放射解剖学问题。ChatGPT-4在两种条件下进行测试:有无关于考试说明和问题格式的背景信息。主要提出的问题是:“识别箭头所指的结构。” 根据正确答案评估回答,并由两位放射学专家(在放射诊断和学术方面分别有超过5年和30年经验)对答案的解释进行评分。我们计算了四个分数:正确性、方位性、模态识别和近似度。后者考虑如果识别出的结构存在但不是问题的重点则为部分正确。

在两种测试条件下,ChatGPT-4的表现都不佳,无背景信息和有背景信息时的正确性得分分别为4%和7.5%。然而,它识别成像模态的准确率为100%。该模型在近似度指标上得分超过50%,即它识别出了箭头未指示的存在的结构。然而,它在识别结构的正确方位方面存在困难,在无背景信息和有背景信息的设置中分别得分约42%和40%。两种设置下只有32%的回答相似。

尽管ChatGPT-4能够正确识别成像模态,但在解读正常放射解剖学方面存在重大局限性。这表明有必要加强正常解剖学方面的训练,以便更好地解读异常放射图像。识别放射图像中结构的正确方位对ChatGPT-4来说仍然是一个挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/191ccb556869/10-1055-s-0044-1792040-i2473882-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/3b2efb878d9f/10-1055-s-0044-1792040-i2473882-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/92546ea65eb2/10-1055-s-0044-1792040-i2473882-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/6cc61bec1779/10-1055-s-0044-1792040-i2473882-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/191ccb556869/10-1055-s-0044-1792040-i2473882-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/3b2efb878d9f/10-1055-s-0044-1792040-i2473882-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/92546ea65eb2/10-1055-s-0044-1792040-i2473882-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/6cc61bec1779/10-1055-s-0044-1792040-i2473882-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7842/12034419/191ccb556869/10-1055-s-0044-1792040-i2473882-4.jpg

相似文献

1
Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions.评估ChatGPT-4在识别FRCR第一部分考试题目中的放射解剖学方面的表现。
Indian J Radiol Imaging. 2024 Nov 4;35(2):287-294. doi: 10.1055/s-0044-1792040. eCollection 2025 Apr.
2
Could ChatGPT Pass the UK Radiology Fellowship Examinations?ChatGPT 能通过英国放射科医师研究员考试吗?
Acad Radiol. 2024 May;31(5):2178-2182. doi: 10.1016/j.acra.2023.11.026. Epub 2023 Dec 29.
3
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
4
Assessing the Capability of ChatGPT, Google Bard, and Microsoft Bing in Solving Radiology Case Vignettes.评估ChatGPT、谷歌巴德和微软必应解决放射学病例 vignettes的能力。
Indian J Radiol Imaging. 2023 Dec 29;34(2):276-282. doi: 10.1055/s-0043-1777746. eCollection 2024 Apr.
5
Accuracy of Information and References Using ChatGPT-3 for Retrieval of Clinical Radiological Information.使用 ChatGPT-3 检索临床放射学信息的信息和参考文献的准确性。
Can Assoc Radiol J. 2024 Feb;75(1):69-73. doi: 10.1177/08465371231171125. Epub 2023 Apr 20.
6
A retrospective evaluation of the potential of ChatGPT in the accurate diagnosis of acute stroke.对ChatGPT在急性中风准确诊断中的潜力进行回顾性评估。
Diagn Interv Radiol. 2025 Apr 28;31(3):187-195. doi: 10.4274/dir.2024.242892. Epub 2024 Sep 2.
7
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
8
Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.人工智能在减重手术中的表现:ChatGPT-4、Bing 和 Bard 在《美国代谢与减重外科学会减重手术教科书》减重手术问题中的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):609-613. doi: 10.1016/j.soard.2024.04.014. Epub 2024 May 8.
9
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
10
Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.使用高级 AI 学习和分析方法评估 ChatGPT-4 在家庭医学委员会考试中的表现:观察性研究。
JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

引用本文的文献

1
Performance of Large Language Models in Recognizing Brain MRI Sequences: A Comparative Analysis of ChatGPT-4o, Claude 4 Opus, and Gemini 2.5 Pro.大语言模型在识别脑部磁共振成像序列方面的表现:ChatGPT-4o、Claude 4 Opus和Gemini 2.5 Pro的比较分析
Diagnostics (Basel). 2025 Jul 30;15(15):1919. doi: 10.3390/diagnostics15151919.
2
Performance of ChatGPT-3.5 and ChatGPT-4 in Solving Questions Based on Core Concepts in Cardiovascular Physiology.ChatGPT-3.5和ChatGPT-4在基于心血管生理学核心概念解决问题方面的表现。
Cureus. 2025 May 6;17(5):e83552. doi: 10.7759/cureus.83552. eCollection 2025 May.

本文引用的文献

1
ChatGPT in medical writing: A game-changer or a gimmick?医学写作中的ChatGPT:变革者还是噱头?
Perspect Clin Res. 2024 Oct-Dec;15(4):165-171. doi: 10.4103/picr.picr_167_23. Epub 2023 Nov 15.
2
Radiologic Decision-Making for Imaging in Pulmonary Embolism: Accuracy and Reliability of Large Language Models-Bing, Claude, ChatGPT, and Perplexity.肺栓塞影像学检查的放射学决策:大语言模型——必应、克劳德、ChatGPT和Perplexity的准确性与可靠性
Indian J Radiol Imaging. 2024 Jul 4;34(4):653-660. doi: 10.1055/s-0044-1787974. eCollection 2024 Oct.
3
Assessing GPT-4 multimodal performance in radiological image analysis.
评估GPT-4在放射图像分析中的多模态性能。
Eur Radiol. 2025 Apr;35(4):1959-1965. doi: 10.1007/s00330-024-11035-5. Epub 2024 Aug 30.
4
Seeing the Unseen: Advancing Generative AI Research in Radiology.洞察无形:推动放射学领域的生成式人工智能研究
Radiology. 2024 May;311(2):e240935. doi: 10.1148/radiol.240935.
5
GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination.GPT-4 Turbo with Vision 在日本诊断放射学委员会考试中未能优于仅文本的 GPT-4 Turbo。
Jpn J Radiol. 2024 Aug;42(8):918-926. doi: 10.1007/s11604-024-01561-z. Epub 2024 May 11.
6
Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.评估 GPT-V4(具有视觉功能的 GPT-4)在检测胸部 X 光片中放射学发现的能力。
Radiology. 2024 May;311(2):e233270. doi: 10.1148/radiol.233270.
7
Assessing the Capability of ChatGPT, Google Bard, and Microsoft Bing in Solving Radiology Case Vignettes.评估ChatGPT、谷歌巴德和微软必应解决放射学病例 vignettes的能力。
Indian J Radiol Imaging. 2023 Dec 29;34(2):276-282. doi: 10.1055/s-0043-1777746. eCollection 2024 Apr.
8
Radiological Differential Diagnoses Based on Cardiovascular and Thoracic Imaging Patterns: Perspectives of Four Large Language Models.基于心血管和胸部影像模式的放射学鉴别诊断:四种大语言模型的视角
Indian J Radiol Imaging. 2023 Dec 28;34(2):269-275. doi: 10.1055/s-0043-1777289. eCollection 2024 Apr.
9
The Potential Applications and Challenges of ChatGPT in the Medical Field.ChatGPT在医学领域的潜在应用与挑战
Int J Gen Med. 2024 Mar 5;17:817-826. doi: 10.2147/IJGM.S456659. eCollection 2024.
10
Collaborative Enhancement of Consistency and Accuracy in US Diagnosis of Thyroid Nodules Using Large Language Models.利用大语言模型提高美国甲状腺结节诊断的一致性和准确性。
Radiology. 2024 Mar;310(3):e232255. doi: 10.1148/radiol.232255.