• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估GPT-4在甲状腺超声诊断及治疗建议中的作用:采用思维链方法提高可解释性

Assessing the role of GPT-4 in thyroid ultrasound diagnosis and treatment recommendations: enhancing interpretability with a chain of thought approach.

作者信息

Wang Zhixiang, Zhang Zhen, Traverso Alberto, Dekker Andre, Qian Linxue, Sun Pengfei

机构信息

Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China.

Department of Radiation Oncology (Maastro), GROW-School for Oncology, Maastricht University Medical Centre+, Maastricht, The Netherlands.

出版信息

Quant Imaging Med Surg. 2024 Feb 1;14(2):1602-1615. doi: 10.21037/qims-23-1180. Epub 2024 Jan 11.

DOI:10.21037/qims-23-1180
PMID:38415150
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10895085/
Abstract

BACKGROUND

As artificial intelligence (AI) becomes increasingly prevalent in the medical field, the effectiveness of AI-generated medical reports in disease diagnosis remains to be evaluated. ChatGPT is a large language model developed by open AI with a notable capacity for text abstraction and comprehension. This study aimed to explore the capabilities, limitations, and potential of Generative Pre-trained Transformer (GPT)-4 in analyzing thyroid cancer ultrasound reports, providing diagnoses, and recommending treatment plans.

METHODS

Using 109 diverse thyroid cancer cases, we evaluated GPT-4's performance by comparing its generated reports to those from doctors with various levels of experience. We also conducted a Turing Test and a consistency analysis. To enhance the interpretability of the model, we applied the Chain of Thought (CoT) method to deconstruct the decision-making chain of the GPT model.

RESULTS

GPT-4 demonstrated proficiency in report structuring, professional terminology, and clarity of expression, but showed limitations in diagnostic accuracy. In addition, our consistency analysis highlighted certain discrepancies in the AI's performance. The CoT method effectively enhanced the interpretability of the AI's decision-making process.

CONCLUSIONS

GPT-4 exhibits potential as a supplementary tool in healthcare, especially for generating thyroid gland diagnostic reports. Our proposed online platform, "ThyroAIGuide", alongside the CoT method, underscores the potential of AI to augment diagnostic processes, elevate healthcare accessibility, and advance patient education. However, the journey towards fully integrating AI into healthcare is ongoing, requiring continuous research, development, and careful monitoring by medical professionals to ensure patient safety and quality of care.

摘要

背景

随着人工智能(AI)在医学领域日益普及,人工智能生成的医学报告在疾病诊断中的有效性仍有待评估。ChatGPT是OpenAI开发的一种大型语言模型,具有显著的文本抽象和理解能力。本研究旨在探讨生成式预训练变换器(GPT)-4在分析甲状腺癌超声报告、提供诊断和推荐治疗方案方面的能力、局限性及潜力。

方法

我们使用109例不同的甲状腺癌病例,通过将GPT-4生成的报告与不同经验水平医生生成的报告进行比较,评估了GPT-4的性能。我们还进行了图灵测试和一致性分析。为提高模型的可解释性,我们应用了思维链(CoT)方法来解构GPT模型的决策链。

结果

GPT-4在报告结构、专业术语和表达清晰度方面表现出色,但在诊断准确性方面存在局限性。此外,我们的一致性分析突出了人工智能性能方面的某些差异。CoT方法有效地提高了人工智能决策过程的可解释性。

结论

GPT-4在医疗保健领域展现出作为辅助工具的潜力,特别是在生成甲状腺诊断报告方面。我们提出的在线平台“ThyroAIGuide”与CoT方法一起,凸显了人工智能增强诊断过程、提高医疗可及性和推进患者教育的潜力。然而,将人工智能全面整合到医疗保健中的进程仍在继续,需要医学专业人员持续进行研究、开发并仔细监测,以确保患者安全和医疗质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/21497b21bed9/qims-14-02-1602-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/c729654cbd2b/qims-14-02-1602-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/52da0f20d7db/qims-14-02-1602-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/4e176e41ad52/qims-14-02-1602-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/07f9b0f5fcb2/qims-14-02-1602-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/f1c1e542be0f/qims-14-02-1602-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/f466ce26f73a/qims-14-02-1602-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/e9cbe06571c3/qims-14-02-1602-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/21497b21bed9/qims-14-02-1602-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/c729654cbd2b/qims-14-02-1602-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/52da0f20d7db/qims-14-02-1602-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/4e176e41ad52/qims-14-02-1602-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/07f9b0f5fcb2/qims-14-02-1602-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/f1c1e542be0f/qims-14-02-1602-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/f466ce26f73a/qims-14-02-1602-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/e9cbe06571c3/qims-14-02-1602-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/10895085/21497b21bed9/qims-14-02-1602-f8.jpg

相似文献

1
Assessing the role of GPT-4 in thyroid ultrasound diagnosis and treatment recommendations: enhancing interpretability with a chain of thought approach.评估GPT-4在甲状腺超声诊断及治疗建议中的作用:采用思维链方法提高可解释性
Quant Imaging Med Surg. 2024 Feb 1;14(2):1602-1615. doi: 10.21037/qims-23-1180. Epub 2024 Jan 11.
2
Preliminary experiments on interpretable ChatGPT-assisted diagnosis for breast ultrasound radiologists.针对乳腺超声放射科医生的可解释性ChatGPT辅助诊断的初步实验。
Quant Imaging Med Surg. 2024 Sep 1;14(9):6601-6612. doi: 10.21037/qims-24-141. Epub 2024 Aug 28.
3
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
4
Arthrosis diagnosis and treatment recommendations in clinical practice: an exploratory investigation with the generative AI model GPT-4.在临床实践中进行关节病诊断和治疗的建议:使用生成式人工智能模型 GPT-4 进行的探索性研究。
J Orthop Traumatol. 2023 Nov 28;24(1):61. doi: 10.1186/s10195-023-00740-4.
5
Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases.评估ChatGPT-4在鉴别诊断中识别最终诊断的准确性与医生的准确性比较:诊断病例的实验研究
JMIR Form Res. 2024 Jun 26;8:e59267. doi: 10.2196/59267.
6
Integrating AI in Lipedema Management: Assessing the Efficacy of GPT-4 as a Consultation Assistant.将人工智能整合到脂肪性水肿管理中:评估GPT-4作为会诊助手的疗效。
Life (Basel). 2024 May 20;14(5):646. doi: 10.3390/life14050646.
7
GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。
World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.
8
Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.评估生成式预训练变换器4(GPT-4)在规范放射学报告方面的性能。
Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8.
9
Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine.评估ChatGPT在应对西班牙医学住院医师入学考试(MIR)中的效果:人工智能在临床医学中的广阔前景。
Clin Pract. 2023 Nov 20;13(6):1460-1487. doi: 10.3390/clinpract13060130.
10
Diagnostic accuracy of large language models in psychiatry.精神科大语言模型的诊断准确性。
Asian J Psychiatr. 2024 Oct;100:104168. doi: 10.1016/j.ajp.2024.104168. Epub 2024 Jul 25.

引用本文的文献

1
Large Language Models for CAD-RADS 2.0 Extraction From Semi-Structured Coronary CT Angiography Reports: A Multi-Institutional Study.用于从半结构化冠状动脉CT血管造影报告中提取CAD-RADS 2.0的大语言模型:一项多机构研究
Korean J Radiol. 2025 Sep;26(9):817-831. doi: 10.3348/kjr.2025.0293.
2
Large language model integrations in cancer decision-making: a systematic review and meta-analysis.大型语言模型在癌症决策中的应用:一项系统综述和荟萃分析。
NPJ Digit Med. 2025 Jul 17;8(1):450. doi: 10.1038/s41746-025-01824-7.
3
Can AI-Based ChatGPT Models Accurately Analyze Hand-Wrist Radiographs? A Comparative Study.

本文引用的文献

1
Enhancing Triage Efficiency and Accuracy in Emergency Rooms for Patients with Metastatic Prostate Cancer: A Retrospective Analysis of Artificial Intelligence-Assisted Triage Using ChatGPT 4.0.提高急诊室中转移性前列腺癌患者的分诊效率和准确性:使用ChatGPT 4.0的人工智能辅助分诊的回顾性分析
Cancers (Basel). 2023 Jul 22;15(14):3717. doi: 10.3390/cancers15143717.
2
Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer?人工智能语言模型ChatGPT能否提供关于前列腺癌的准确且高质量的患者信息?
Urology. 2023 Oct;180:35-58. doi: 10.1016/j.urology.2023.05.040. Epub 2023 Jul 4.
3
基于人工智能的ChatGPT模型能否准确分析手腕X光片?一项对比研究。
Diagnostics (Basel). 2025 Jun 14;15(12):1513. doi: 10.3390/diagnostics15121513.
4
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
5
Large language models in oncology: a review.肿瘤学中的大语言模型:综述
BMJ Oncol. 2025 May 15;4(1):e000759. doi: 10.1136/bmjonc-2025-000759. eCollection 2025.
6
Evaluating performance of large language models for atrial fibrillation management using different prompting strategies and languages.使用不同的提示策略和语言评估大语言模型在房颤管理方面的性能。
Sci Rep. 2025 May 30;15(1):19028. doi: 10.1038/s41598-025-04309-5.
7
Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.用于膝关节骨关节炎中基于MRI评分的高效全器官报告和分类的大语言模型
Insights Imaging. 2025 May 14;16(1):100. doi: 10.1186/s13244-025-01976-w.
8
Applications of Natural Language Processing in Otolaryngology: A Scoping Review.自然语言处理在耳鼻咽喉科的应用:一项范围综述
Laryngoscope. 2025 Sep;135(9):3049-3063. doi: 10.1002/lary.32198. Epub 2025 May 1.
9
Medical accuracy of artificial intelligence chatbots in oncology: a scoping review.人工智能聊天机器人在肿瘤学中的医学准确性:一项范围综述。
Oncologist. 2025 Apr 4;30(4). doi: 10.1093/oncolo/oyaf038.
10
Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis.比较临床专业人员和大语言模型的诊断准确性:系统评价与荟萃分析
JMIR Med Inform. 2025 Apr 25;13:e64963. doi: 10.2196/64963.
Using AI-generated suggestions from ChatGPT to optimize clinical decision support.
利用 ChatGPT 生成的人工智能建议来优化临床决策支持。
J Am Med Inform Assoc. 2023 Jun 20;30(7):1237-1245. doi: 10.1093/jamia/ocad072.
4
Evaluating diagnostic content of AI-generated chest radiography: A multi-center visual Turing test.评估人工智能生成的胸部 X 光片的诊断内容:一项多中心视觉图灵测试。
PLoS One. 2023 Apr 12;18(4):e0279349. doi: 10.1371/journal.pone.0279349. eCollection 2023.
5
Explainable artificial intelligence for mental health through transparency and interpretability for understandability.通过透明度和可解释性实现心理健康的可解释人工智能,以提高可理解性。
NPJ Digit Med. 2023 Jan 18;6(1):6. doi: 10.1038/s41746-023-00751-9.
6
The reproducibility issues that haunt health-care AI.困扰医疗保健人工智能的可重复性问题。
Nature. 2023 Jan;613(7943):402-403. doi: 10.1038/d41586-023-00023-2.
7
Artificial Intelligence Applications in Health Care Practice: Scoping Review.人工智能在医疗实践中的应用:范围综述。
J Med Internet Res. 2022 Oct 5;24(10):e40238. doi: 10.2196/40238.
8
Early detection of suspicious lymph nodes in differentiated thyroid cancer.分化型甲状腺癌可疑淋巴结的早期检测。
Expert Rev Endocrinol Metab. 2022 Sep;17(5):447-454. doi: 10.1080/17446651.2022.2112176. Epub 2022 Aug 21.
9
Fast Healthcare Interoperability Resources (FHIR) for Interoperability in Health Research: Systematic Review.用于健康研究互操作性的快速医疗保健互操作性资源(FHIR):系统评价
JMIR Med Inform. 2022 Jul 19;10(7):e35724. doi: 10.2196/35724.
10
Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology.揭开黑箱:可解释机器学习在心脏病学中的前景与局限。
Can J Cardiol. 2022 Feb;38(2):204-213. doi: 10.1016/j.cjca.2021.09.004. Epub 2021 Sep 14.