• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较ChatGPT 4.0在使用2017版美国放射学会甲状腺影像报告和数据系统(ACR-TI-RADS)解读甲状腺结节超声报告方面的表现:不同超声用户经验水平的分析

Comparing ChatGPT 4.0's Performance in Interpreting Thyroid Nodule Ultrasound Reports Using ACR-TI-RADS 2017: Analysis Across Different Levels of Ultrasound User Experience.

作者信息

Wakonig Katharina Margherita, Barisch Simon, Kozarzewski Leonard, Dommerich Steffen, Lerchbaumer Markus Herbert

机构信息

Department of Otorhinolaryngology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Campus Virchow Klinikum and Campus Charité Mitte, Charitéplatz 1, 10117 Berlin, Germany.

Department of Endocrinology, Diabetes and Metabolism, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany.

出版信息

Diagnostics (Basel). 2025 Mar 6;15(5):635. doi: 10.3390/diagnostics15050635.

DOI:10.3390/diagnostics15050635
PMID:40075883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11899695/
Abstract

This study evaluates ChatGPT 4.0's ability to interpret thyroid ultrasound (US) reports using ACR-TI-RADS 2017 criteria, comparing its performance with different levels of US users. A team of medical experts, an inexperienced US user, and ChatGPT 4.0 analyzed 100 fictitious thyroid US reports. ChatGPT's performance was assessed for accuracy, consistency, and diagnostic recommendations, including fine-needle aspirations (FNA) and follow-ups. ChatGPT demonstrated substantial agreement with experts in assessing echogenic foci, but inconsistencies in other criteria, such as composition and margins, were evident in both its analyses. Interrater reliability between ChatGPT and experts ranged from moderate to almost perfect, reflecting AI's potential but also its limitations in achieving expert-level interpretations. The inexperienced US user outperformed ChatGPT with a nearly perfect agreement with the experts, highlighting the critical role of traditional medical training in standardized risk stratification tools such as TI-RADS. ChatGPT showed high specificity in recommending FNAs but lower sensitivity and specificity for follow-ups compared to the medical student. These findings emphasize ChatGPT's potential as a supportive diagnostic tool rather than a replacement for human expertise. Enhancing AI algorithms and training could improve ChatGPT's clinical utility, enabling better support for clinicians in managing thyroid nodules and improving patient care. This study highlights both the promise and current limitations of AI in medical diagnostics, advocating for its refinement and integration into clinical workflows. However, it emphasizes that traditional clinical training must not be compromised, as it is essential for identifying and correcting AI-driven errors.

摘要

本研究评估了ChatGPT 4.0使用2017版美国放射学会甲状腺影像报告和数据系统(ACR-TI-RADS)标准解读甲状腺超声(US)报告的能力,并将其表现与不同水平的超声检查使用者进行比较。一组医学专家、一名缺乏经验的超声检查使用者以及ChatGPT 4.0分析了100份虚拟的甲状腺超声报告。从准确性、一致性以及诊断建议(包括细针穿刺活检(FNA)和随访)方面评估了ChatGPT的表现。ChatGPT在评估回声灶方面与专家意见高度一致,但在其分析中,在其他标准(如成分和边缘)方面的不一致也很明显。ChatGPT与专家之间的评分者间信度从中度到几乎完美不等,这反映了人工智能在实现专家级解读方面的潜力及其局限性。这位缺乏经验的超声检查使用者表现优于ChatGPT,与专家意见几乎完全一致,凸显了传统医学培训在TI-RADS等标准化风险分层工具中的关键作用。ChatGPT在推荐FNA方面具有较高的特异性,但与医学生相比,在随访方面的敏感性和特异性较低。这些发现强调了ChatGPT作为辅助诊断工具的潜力,而非取代人类专业知识。增强人工智能算法和训练可以提高ChatGPT的临床效用,从而在管理甲状腺结节和改善患者护理方面为临床医生提供更好的支持。本研究突出了人工智能在医学诊断中的前景和当前局限性,主张对其进行改进并将其整合到临床工作流程中。然而,研究强调传统临床培训绝不能受到影响,因为它对于识别和纠正人工智能驱动的错误至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e75/11899695/2874931906fc/diagnostics-15-00635-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e75/11899695/8f7b4dfe7353/diagnostics-15-00635-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e75/11899695/e531f2ec7f89/diagnostics-15-00635-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e75/11899695/2874931906fc/diagnostics-15-00635-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e75/11899695/8f7b4dfe7353/diagnostics-15-00635-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e75/11899695/e531f2ec7f89/diagnostics-15-00635-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e75/11899695/2874931906fc/diagnostics-15-00635-g003.jpg

相似文献

1
Comparing ChatGPT 4.0's Performance in Interpreting Thyroid Nodule Ultrasound Reports Using ACR-TI-RADS 2017: Analysis Across Different Levels of Ultrasound User Experience.比较ChatGPT 4.0在使用2017版美国放射学会甲状腺影像报告和数据系统(ACR-TI-RADS)解读甲状腺结节超声报告方面的表现:不同超声用户经验水平的分析
Diagnostics (Basel). 2025 Mar 6;15(5):635. doi: 10.3390/diagnostics15050635.
2
Comparison of diagnostic accuracy and utility of artificial intelligence-optimized ACR TI-RADS and original ACR TI-RADS: a multi-center validation study based on 2061 thyroid nodules.人工智能优化的 ACR TI-RADS 与原始 ACR TI-RADS 的诊断准确性和实用性比较:一项基于 2061 个甲状腺结节的多中心验证研究。
Eur Radiol. 2022 Nov;32(11):7733-7742. doi: 10.1007/s00330-022-08827-y. Epub 2022 May 4.
3
Using Artificial Intelligence to Revise ACR TI-RADS Risk Stratification of Thyroid Nodules: Diagnostic Accuracy and Utility.使用人工智能修订甲状腺结节 ACR TI-RADS 风险分层:诊断准确性和实用性。
Radiology. 2019 Jul;292(1):112-119. doi: 10.1148/radiol.2019182128. Epub 2019 May 21.
4
Optimizing ChatGPT's Interpretation and Reporting of Delirium Assessment Outcomes: Exploratory Study.优化 ChatGPT 对谵妄评估结果的解释和报告:探索性研究。
JMIR Form Res. 2024 Oct 1;8:e51383. doi: 10.2196/51383.
5
Risk stratification of thyroid nodules: Assessing the suitability of ChatGPT for text-based analysis.甲状腺结节风险分层:评估 ChatGPT 进行基于文本的分析的适用性。
Am J Otolaryngol. 2024 Mar-Apr;45(2):104144. doi: 10.1016/j.amjoto.2023.104144. Epub 2023 Dec 7.
6
Evaluating ChatGPT in Qualitative Thematic Analysis With Human Researchers in the Japanese Clinical Context and Its Cultural Interpretation Challenges: Comparative Qualitative Study.在日本临床背景下与人类研究人员一起在定性主题分析中评估ChatGPT及其文化解释挑战:比较定性研究
J Med Internet Res. 2025 Apr 24;27:e71521. doi: 10.2196/71521.
7
Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.评估生成式对话人工智能在破除睡眠健康误区方面的准确性:采用专家分析的混合方法比较研究
JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.
8
Modified American College of Radiology Thyroid Imaging Reporting and Data System and Modified Artificial Intelligence Thyroid Imaging Reporting and Data System for Thyroid Nodules: A Multicenter Retrospective Study.美国放射学院甲状腺影像报告和数据系统改良版和人工智能改良版甲状腺影像报告和数据系统在甲状腺结节中的应用:一项多中心回顾性研究。
Thyroid. 2024 Jan;34(1):88-100. doi: 10.1089/thy.2023.0429. Epub 2023 Dec 7.
9
Artificial intelligence (ChatGPT 4.0) vs. Human expertise for epileptic seizure and epilepsy diagnosis and classification in Adults: An exploratory study.成人癫痫发作及癫痫诊断与分类中的人工智能(ChatGPT 4.0)与人类专业知识:一项探索性研究
Epilepsy Behav. 2025 May;166:110364. doi: 10.1016/j.yebeh.2025.110364. Epub 2025 Mar 12.
10
Diagnostic performance of artificial intelligence in interpreting thyroid nodules on ultrasound images: a multicenter retrospective study.人工智能在解读甲状腺结节超声图像中的诊断性能:一项多中心回顾性研究。
Quant Imaging Med Surg. 2024 May 1;14(5):3676-3694. doi: 10.21037/qims-23-1650. Epub 2024 Apr 23.

本文引用的文献

1
Performance of computer scientists in the assessment of thyroid nodules using TIRADS lexicons.计算机科学家使用甲状腺影像报告和数据系统(TIRADS)术语对甲状腺结节进行评估的表现。
J Endocrinol Invest. 2025 Apr;48(4):877-883. doi: 10.1007/s40618-024-02518-9. Epub 2024 Dec 18.
2
Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注:系统评价。
J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.
3
Clinical application potential of large language model: a study based on thyroid nodules.
大语言模型的临床应用潜力:一项基于甲状腺结节的研究。
Endocrine. 2025 Jan;87(1):206-213. doi: 10.1007/s12020-024-03981-3. Epub 2024 Jul 30.
4
Diagnostic performance of artificial intelligence in interpreting thyroid nodules on ultrasound images: a multicenter retrospective study.人工智能在解读甲状腺结节超声图像中的诊断性能:一项多中心回顾性研究。
Quant Imaging Med Surg. 2024 May 1;14(5):3676-3694. doi: 10.21037/qims-23-1650. Epub 2024 Apr 23.
5
Exploring ChatGPT's abilities in medical article writing and peer review.探索 ChatGPT 在医学文章写作和同行评审中的能力。
Croat Med J. 2024 Apr 30;65(2):93-100. doi: 10.3325/cmj.2024.65.93.
6
Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients.比较ChatGPT与外科医生对患者甲状腺相关问题的回答。
J Clin Endocrinol Metab. 2025 Feb 18;110(3):e841-e850. doi: 10.1210/clinem/dgae235.
7
ChatGPT as an information tool in rhinology. Can we trust each other today?ChatGPT作为鼻科学中的一种信息工具。如今我们能相互信任吗?
Eur Arch Otorhinolaryngol. 2024 Jun;281(6):3253-3259. doi: 10.1007/s00405-024-08581-5. Epub 2024 Mar 4.
8
Assessing the role of GPT-4 in thyroid ultrasound diagnosis and treatment recommendations: enhancing interpretability with a chain of thought approach.评估GPT-4在甲状腺超声诊断及治疗建议中的作用:采用思维链方法提高可解释性
Quant Imaging Med Surg. 2024 Feb 1;14(2):1602-1615. doi: 10.21037/qims-23-1180. Epub 2024 Jan 11.
9
Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology - a recent scoping review.使用大型语言模型(如 ChatGPT)进行诊断医学的挑战和障碍,重点是数字病理学——近期的范围综述。
Diagn Pathol. 2024 Feb 27;19(1):43. doi: 10.1186/s13000-024-01464-7.
10
Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?人类智能与Chat-GPT:在分诊中对患者进行正确分类时谁表现得更好?
Am J Emerg Med. 2024 May;79:44-47. doi: 10.1016/j.ajem.2024.02.008. Epub 2024 Feb 7.