基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性：一项初步研究。

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

机构信息

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan.

出版信息

Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.

DOI:10.3390/ijerph20043378

PMID:36834073

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9967747/

Abstract

The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.

摘要

人工智能（AI）聊天机器人生成的鉴别诊断的准确性，包括生成式预训练转换器 3（GPT-3）聊天机器人（ChatGPT-3），目前尚不清楚。本研究评估了 ChatGPT-3 对常见主诉临床病例生成的鉴别诊断列表的准确性。内科医生创建了临床病例、正确诊断和十个常见主诉的五个鉴别诊断。ChatGPT-3 在十个鉴别诊断列表中的正确诊断率为 28/30（93.3%）。在五个鉴别诊断列表中，医生的正确诊断率仍高于 ChatGPT-3（98.3% vs. 83.3%， = 0.03）。医生在主要诊断中的正确诊断率也高于 ChatGPT-3（53.3% vs. 93.3%，<0.001）。ChatGPT-3 生成的十个鉴别诊断列表中，医生之间的鉴别诊断一致性率为 62/88（70.5%）。总之，本研究表明 ChatGPT-3 对常见主诉临床病例生成的鉴别诊断列表具有较高的诊断准确性。这表明，像 ChatGPT-3 这样的 AI 聊天机器人可以为常见主诉生成一个良好区分的诊断列表。然而，这些列表的顺序在未来可以得到改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26ea/9967747/85c9cde85349/ijerph-20-03378-g001.jpg

相似文献

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.

ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.

JMIR Med Inform. 2023 Oct 9;11:e48808. doi: 10.2196/48808.

Can ChatGPT-4 evaluate whether a differential diagnosis list contains the correct diagnosis as accurately as a physician?

Diagnosis (Berl). 2024 Mar 12;11(3):321-324. doi: 10.1515/dx-2024-0027. eCollection 2024 Aug 1.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

Diagnostic performance of generative artificial intelligences for a series of complex case reports.

Digit Health. 2024 Jul 21;10:20552076241265215. doi: 10.1177/20552076241265215. eCollection 2024 Jan-Dec.

Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports.

Sci Rep. 2024 Apr 23;14(1):9330. doi: 10.1038/s41598-024-58760-x.

Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases.

JMIR Form Res. 2024 Jun 26;8:e59267. doi: 10.2196/59267.

Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research.

JMIR Med Educ. 2024 Jun 21;10:e58758. doi: 10.2196/58758.

Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.

JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study.

J Med Internet Res. 2023 Sep 15;25:e47621. doi: 10.2196/47621.

引用本文的文献

Utilizing Artificial Intelligence for the Diagnosis, Assessment, and Management of Chronic Pain.

J Biomed Phys Eng. 2025 Aug 1;15(4):311-322. doi: 10.31661/jbpe.v0i0.2306-1629. eCollection 2025 Aug.

Comparative Accuracy Assessment of Large Language Models in Cardiothoracic Anesthesia: A Performance Analysis of Claude and ChatGPT-4 on Subspecialty Board-Style Questions.

Cureus. 2025 Jul 23;17(7):e88591. doi: 10.7759/cureus.88591. eCollection 2025 Jul.

Performance of Microsoft Copilot in the Diagnostic Process of Pulmonary Embolism.

West J Emerg Med. 2025 Jul 13;26(4):1030-1039. doi: 10.5811/westjem.24995.

Evaluation of the accuracy of ChatGPT-4 and Gemini's responses to the World Dental Federation's frequently asked questions on oral health.

BMC Oral Health. 2025 Aug 2;25(1):1293. doi: 10.1186/s12903-025-06624-9.

"Digital Clinicians" Performing Obesity Medication Self-Injection Education: Feasibility Randomized Controlled Trial.

JMIR Diabetes. 2025 Jul 30;10:e63503. doi: 10.2196/63503.

ChatGpt's accuracy in the diagnosis of oral lesions.

BMC Oral Health. 2025 Jul 21;25(1):1229. doi: 10.1186/s12903-025-06582-2.

Utilizing ChatGPT-3.5 to Assist Ophthalmologists in Clinical Decision-making.

J Ophthalmic Vis Res. 2025 May 5;20. doi: 10.18502/jovr.v20.14692. eCollection 2025.

Diagnostic efficacy of large language models in the pediatric emergency department: a pilot study.

Front Digit Health. 2025 Jul 1;7:1624786. doi: 10.3389/fdgth.2025.1624786. eCollection 2025.

Evaluation of ChatGPT's performance in providing treatment recommendations for pediatric diseases.

Pediatr Discov. 2023 Nov 20;1(3):e42. doi: 10.1002/pdi3.42. eCollection 2023 Dec.

Comparison of physician and large language model chatbot responses to online ear, nose, and throat inquiries.

Sci Rep. 2025 Jul 1;15(1):21346. doi: 10.1038/s41598-025-06769-1.

本文引用的文献

The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.

Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.

Predicting dementia from spontaneous speech using large language models.

PLOS Digit Health. 2022 Dec 22;1(12):e0000168. doi: 10.1371/journal.pdig.0000168. eCollection 2022 Dec.

The Future of AI in Medicine: A Perspective from a Chatbot.

Ann Biomed Eng. 2023 Feb;51(2):291-295. doi: 10.1007/s10439-022-03121-w. Epub 2022 Dec 26.

AI bot ChatGPT writes smart essays - should professors worry?

Nature. 2022 Dec 9. doi: 10.1038/d41586-022-04397-7.

Natural Language Processing for Smart Healthcare.

IEEE Rev Biomed Eng. 2024;17:4-18. doi: 10.1109/RBME.2022.3210270. Epub 2024 Jan 12.

Decoding Artificial Intelligence to Achieve Diagnostic Excellence: Learning From Experts, Examples, and Experience.

JAMA. 2022 Aug 23;328(8):709-710. doi: 10.1001/jama.2022.13735.

Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation.

J Med Internet Res. 2022 May 10;24(5):e31810. doi: 10.2196/31810.

New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology.

Br J Ophthalmol. 2022 Jul;106(7):889-892. doi: 10.1136/bjophthalmol-2022-321141. Epub 2022 May 6.

Uncovering interpretable potential confounders in electronic medical records.

Nat Commun. 2022 Feb 23;13(1):1014. doi: 10.1038/s41467-022-28546-8.

Operationalizing and Implementing Pretrained, Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a Service Model.

JMIR Med Inform. 2022 Feb 10;10(2):e32875. doi: 10.2196/32875.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性：一项初步研究。

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

机构信息

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan.

出版信息

Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.

DOI:10.3390/ijerph20043378

PMID:36834073

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9967747/

Abstract

摘要

基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性：一项初步研究。

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性：一项初步研究。

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献