• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性:一项初步研究。

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

机构信息

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan.

出版信息

Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.

DOI:10.3390/ijerph20043378
PMID:36834073
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9967747/
Abstract

The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.

摘要

人工智能(AI)聊天机器人生成的鉴别诊断的准确性,包括生成式预训练转换器 3(GPT-3)聊天机器人(ChatGPT-3),目前尚不清楚。本研究评估了 ChatGPT-3 对常见主诉临床病例生成的鉴别诊断列表的准确性。内科医生创建了临床病例、正确诊断和十个常见主诉的五个鉴别诊断。ChatGPT-3 在十个鉴别诊断列表中的正确诊断率为 28/30(93.3%)。在五个鉴别诊断列表中,医生的正确诊断率仍高于 ChatGPT-3(98.3% vs. 83.3%, = 0.03)。医生在主要诊断中的正确诊断率也高于 ChatGPT-3(53.3% vs. 93.3%,<0.001)。ChatGPT-3 生成的十个鉴别诊断列表中,医生之间的鉴别诊断一致性率为 62/88(70.5%)。总之,本研究表明 ChatGPT-3 对常见主诉临床病例生成的鉴别诊断列表具有较高的诊断准确性。这表明,像 ChatGPT-3 这样的 AI 聊天机器人可以为常见主诉生成一个良好区分的诊断列表。然而,这些列表的顺序在未来可以得到改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26ea/9967747/0219d924c284/ijerph-20-03378-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26ea/9967747/85c9cde85349/ijerph-20-03378-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26ea/9967747/0219d924c284/ijerph-20-03378-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26ea/9967747/85c9cde85349/ijerph-20-03378-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26ea/9967747/0219d924c284/ijerph-20-03378-g002.jpg

相似文献

1
Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性:一项初步研究。
Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.
2
ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.基于复杂病例临床案例生成的ChatGPT鉴别诊断列表:诊断准确性评估。
JMIR Med Inform. 2023 Oct 9;11:e48808. doi: 10.2196/48808.
3
Can ChatGPT-4 evaluate whether a differential diagnosis list contains the correct diagnosis as accurately as a physician?ChatGPT-4能否像医生一样准确评估鉴别诊断列表是否包含正确的诊断?
Diagnosis (Berl). 2024 Mar 12;11(3):321-324. doi: 10.1515/dx-2024-0027. eCollection 2024 Aug 1.
4
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用:开发和可用性研究。
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.
5
Diagnostic performance of generative artificial intelligences for a series of complex case reports.生成式人工智能对一系列复杂病例报告的诊断性能
Digit Health. 2024 Jul 21;10:20552076241265215. doi: 10.1177/20552076241265215. eCollection 2024 Jan-Dec.
6
Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports.评估ChatGPT 4.0在美国医师执照考试第二步临床知识考试(USMLE STEP 2 CK)及临床病例报告中的测试表现和临床诊断准确性。
Sci Rep. 2024 Apr 23;14(1):9330. doi: 10.1038/s41598-024-58760-x.
7
Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases.评估ChatGPT-4在鉴别诊断中识别最终诊断的准确性与医生的准确性比较:诊断病例的实验研究
JMIR Form Res. 2024 Jun 26;8:e59267. doi: 10.2196/59267.
8
Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research.评估 ChatGPT 生成的不典型表现常见疾病的鉴别诊断:描述性研究。
JMIR Med Educ. 2024 Jun 21;10:e58758. doi: 10.2196/58758.
9
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
10
The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study.ChatGPT 在常见骨科疾病自我诊断中的潜力:探索性研究。
J Med Internet Res. 2023 Sep 15;25:e47621. doi: 10.2196/47621.

引用本文的文献

1
Utilizing Artificial Intelligence for the Diagnosis, Assessment, and Management of Chronic Pain.利用人工智能进行慢性疼痛的诊断、评估和管理。
J Biomed Phys Eng. 2025 Aug 1;15(4):311-322. doi: 10.31661/jbpe.v0i0.2306-1629. eCollection 2025 Aug.
2
Comparative Accuracy Assessment of Large Language Models in Cardiothoracic Anesthesia: A Performance Analysis of Claude and ChatGPT-4 on Subspecialty Board-Style Questions.大语言模型在心胸麻醉中的比较准确性评估:Claude和ChatGPT-4在专科委员会风格问题上的性能分析
Cureus. 2025 Jul 23;17(7):e88591. doi: 10.7759/cureus.88591. eCollection 2025 Jul.
3
Performance of Microsoft Copilot in the Diagnostic Process of Pulmonary Embolism.

本文引用的文献

1
The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性:一项观察性研究。
Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.
2
Predicting dementia from spontaneous speech using large language models.使用大语言模型从自发语言中预测痴呆症。
PLOS Digit Health. 2022 Dec 22;1(12):e0000168. doi: 10.1371/journal.pdig.0000168. eCollection 2022 Dec.
3
The Future of AI in Medicine: A Perspective from a Chatbot.人工智能在医学领域的未来:来自聊天机器人的视角
微软Copilot在肺栓塞诊断过程中的表现。
West J Emerg Med. 2025 Jul 13;26(4):1030-1039. doi: 10.5811/westjem.24995.
4
Evaluation of the accuracy of ChatGPT-4 and Gemini's responses to the World Dental Federation's frequently asked questions on oral health.评估ChatGPT-4和Gemini对世界牙科联盟关于口腔健康常见问题的回答的准确性。
BMC Oral Health. 2025 Aug 2;25(1):1293. doi: 10.1186/s12903-025-06624-9.
5
"Digital Clinicians" Performing Obesity Medication Self-Injection Education: Feasibility Randomized Controlled Trial.“数字临床医生”开展肥胖症药物自我注射教育:可行性随机对照试验
JMIR Diabetes. 2025 Jul 30;10:e63503. doi: 10.2196/63503.
6
ChatGpt's accuracy in the diagnosis of oral lesions.ChatGPT在口腔病变诊断中的准确性。
BMC Oral Health. 2025 Jul 21;25(1):1229. doi: 10.1186/s12903-025-06582-2.
7
Utilizing ChatGPT-3.5 to Assist Ophthalmologists in Clinical Decision-making.利用ChatGPT-3.5辅助眼科医生进行临床决策。
J Ophthalmic Vis Res. 2025 May 5;20. doi: 10.18502/jovr.v20.14692. eCollection 2025.
8
Diagnostic efficacy of large language models in the pediatric emergency department: a pilot study.大型语言模型在儿科急诊科的诊断效能:一项试点研究。
Front Digit Health. 2025 Jul 1;7:1624786. doi: 10.3389/fdgth.2025.1624786. eCollection 2025.
9
Evaluation of ChatGPT's performance in providing treatment recommendations for pediatric diseases.评估ChatGPT在提供儿科疾病治疗建议方面的表现。
Pediatr Discov. 2023 Nov 20;1(3):e42. doi: 10.1002/pdi3.42. eCollection 2023 Dec.
10
Comparison of physician and large language model chatbot responses to online ear, nose, and throat inquiries.医生与大语言模型聊天机器人对在线耳鼻喉科问诊回复的比较。
Sci Rep. 2025 Jul 1;15(1):21346. doi: 10.1038/s41598-025-06769-1.
Ann Biomed Eng. 2023 Feb;51(2):291-295. doi: 10.1007/s10439-022-03121-w. Epub 2022 Dec 26.
4
AI bot ChatGPT writes smart essays - should professors worry?人工智能聊天机器人ChatGPT能写出很巧妙的文章——教授们应该担心吗?
Nature. 2022 Dec 9. doi: 10.1038/d41586-022-04397-7.
5
Natural Language Processing for Smart Healthcare.自然语言处理在智慧医疗中的应用。
IEEE Rev Biomed Eng. 2024;17:4-18. doi: 10.1109/RBME.2022.3210270. Epub 2024 Jan 12.
6
Decoding Artificial Intelligence to Achieve Diagnostic Excellence: Learning From Experts, Examples, and Experience.解码人工智能以实现卓越诊断:借鉴专家、实例和经验。
JAMA. 2022 Aug 23;328(8):709-710. doi: 10.1001/jama.2022.13735.
7
Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation.症状检查器应用程序的分诊准确性:5 年随访评估。
J Med Internet Res. 2022 May 10;24(5):e31810. doi: 10.2196/31810.
8
New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology.NLP 有了新的含义:眼科中使用 GPT-3 进行自然语言处理的尝试和磨难。
Br J Ophthalmol. 2022 Jul;106(7):889-892. doi: 10.1136/bjophthalmol-2022-321141. Epub 2022 May 6.
9
Uncovering interpretable potential confounders in electronic medical records.揭示电子病历中可解释的潜在混杂因素。
Nat Commun. 2022 Feb 23;13(1):1014. doi: 10.1038/s41467-022-28546-8.
10
Operationalizing and Implementing Pretrained, Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a Service Model.在美国医疗保健系统中实施和应用预训练的大型人工智能语言模型:生成式预训练变换器3(GPT-3)作为服务模型的前景
JMIR Med Inform. 2022 Feb 10;10(2):e32875. doi: 10.2196/32875.