• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPT-4o和Claude 3 Opus根据病史和尸检CT结果确定死因的诊断性能

Diagnostic Performance of GPT-4o and Claude 3 Opus in Determining Causes of Death From Medical Histories and Postmortem CT Findings.

作者信息

Ishida Masanori, Gonoi Wataru, Nyunoya Keisuke, Abe Hiroyuki, Shirota Go, Okimoto Naomasa, Fujimoto Kotaro, Kurokawa Mariko, Nakai Motoki, Saito Kazuhiro, Ushiku Tetsuo, Abe Osamu

机构信息

Department of Radiology, Tokyo Medical University Hospital, Tokyo, JPN.

Department of Radiology, The University of Tokyo Hospital, Tokyo, JPN.

出版信息

Cureus. 2024 Aug 20;16(8):e67306. doi: 10.7759/cureus.67306. eCollection 2024 Aug.

DOI:10.7759/cureus.67306
PMID:39301343
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11412385/
Abstract

INTRODUCTION

This study evaluates the diagnostic performance of the latest large language models (LLMs), GPT-4o (OpenAI, San Francisco, CA, USA) and Claude 3 Opus (Anthropic, San Francisco, CA, USA), in determining causes of death from medical histories and postmortem CT findings.

METHODS

We included 100 adult cases whose postmortem CT scans were diagnosable for the causes of death using the gold standard of autopsy results. Their medical histories and postmortem CT findings were compiled, and clinical and imaging diagnoses of both the underlying and immediate causes of death, as well as their personal information, were carefully separated from the database to be shown to the LLMs. Both GPT-4o and Claude 3 Opus generated the top three differential diagnoses for each of the underlying or immediate causes of death based on the following three prompts: 1) medical history only; 2) postmortem CT findings only; and 3) both medical history and postmortem CT findings. The diagnostic performance of the LLMs was compared using McNemar's test.

RESULTS

For the underlying cause of death, GPT-4o achieved primary diagnostic accuracy rates of 78%, 72%, and 78%, while Claude 3 Opus achieved 72%, 56%, and 75% for prompts 1, 2, and 3, respectively. Including any of the top three differential diagnoses, GPT-4o's accuracy rates were 92%, 90%, and 92%, while Claude 3 Opus's rates were 93%, 69%, and 93% for prompts 1, 2, and 3, respectively. For the immediate cause of death, GPT-4o's primary diagnostic accuracy rates were 55%, 58%, and 62%, while Claude 3 Opus's rates were 60%, 62%, and 63% for prompts 1,2, and 3, respectively. For any of the top three differential diagnoses, GPT-4o's accuracy rates were 88% for prompt 1 and 91% for prompts 2 and 3, whereas Claude 3 Opus's rates were 92% for all three prompts. Significant differences between the models were observed for prompt two in diagnosing the underlying cause of death (p = 0.03 and <0.01 for the primary and top three differential diagnoses, respectively).

CONCLUSION

Both GPT-4o and Claude 3 Opus demonstrated relatively high performance in diagnosing both the underlying and immediate causes of death using medical histories and postmortem CT findings.

摘要

引言

本研究评估了最新的大语言模型GPT-4o(美国加利福尼亚州旧金山OpenAI公司)和Claude 3 Opus(美国加利福尼亚州旧金山Anthropic公司)根据病史和尸检CT结果确定死亡原因的诊断性能。

方法

我们纳入了100例成年病例,其尸检CT扫描结果可根据尸检结果的金标准诊断出死亡原因。整理了他们的病史和尸检CT结果,并将死亡的潜在原因和直接原因的临床及影像诊断以及他们的个人信息从数据库中仔细分离出来,展示给大语言模型。GPT-4o和Claude 3 Opus根据以下三个提示,针对每个潜在或直接死亡原因生成了前三种鉴别诊断:1)仅病史;2)仅尸检CT结果;3)病史和尸检CT结果。使用McNemar检验比较大语言模型的诊断性能。

结果

对于潜在死亡原因,GPT-4o在提示1、2和3下的初步诊断准确率分别为78%、72%和78%,而Claude 3 Opus的准确率分别为72%、56%和75%。包括任何一种前三种鉴别诊断,GPT-4o在提示1、2和3下的准确率分别为92%、90%和92%,而Claude 3 Opus的准确率分别为93%、69%和93%。对于直接死亡原因,GPT-4o在提示1、2和3下的初步诊断准确率分别为55%、58%和62%,而Claude 3 Opus的准确率分别为60%、62%和63%。对于任何一种前三种鉴别诊断,GPT-4o在提示1下的准确率为88%,在提示2和3下的准确率为91%,而Claude 3 Opus在所有三种提示下的准确率均为92%。在诊断潜在死亡原因的提示2方面,观察到模型之间存在显著差异(初步诊断和前三种鉴别诊断的p值分别为0.03和<0.01)。

结论

GPT-4o和Claude 3 Opus在使用病史和尸检CT结果诊断潜在和直接死亡原因方面均表现出相对较高的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2be/11412385/639b1734d156/cureus-0016-00000067306-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2be/11412385/ce4cf8b9dadf/cureus-0016-00000067306-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2be/11412385/639b1734d156/cureus-0016-00000067306-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2be/11412385/ce4cf8b9dadf/cureus-0016-00000067306-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2be/11412385/639b1734d156/cureus-0016-00000067306-i02.jpg

相似文献

1
Diagnostic Performance of GPT-4o and Claude 3 Opus in Determining Causes of Death From Medical Histories and Postmortem CT Findings.GPT-4o和Claude 3 Opus根据病史和尸检CT结果确定死因的诊断性能
Cureus. 2024 Aug 20;16(8):e67306. doi: 10.7759/cureus.67306. eCollection 2024 Aug.
2
Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.GPT-4o、Claude 3 Opus 和 Gemini 1.5 Pro 在“诊断请”案例中的诊断性能。
Jpn J Radiol. 2024 Nov;42(11):1231-1235. doi: 10.1007/s11604-024-01619-y. Epub 2024 Jul 1.
3
Diagnostic accuracy of vision-language models on Japanese diagnostic radiology, nuclear medicine, and interventional radiology specialty board examinations.视觉语言模型在日本放射诊断学、核医学和介入放射学专业委员会考试中的诊断准确性。
Jpn J Radiol. 2024 Dec;42(12):1392-1398. doi: 10.1007/s11604-024-01633-0. Epub 2024 Jul 20.
4
Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis.用于黑色素瘤诊断的皮肤镜图像分析中Claude 3 Opus和配备GPT-4的ChatGPT:比较性能分析
JMIR Med Inform. 2024 Aug 6;12:e59273. doi: 10.2196/59273.
5
Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.评估ChatGPT-4o和Claude 3-Opus基于超声图像进行甲状腺结节分类的可行性。
Endocrine. 2025 Mar;87(3):1041-1049. doi: 10.1007/s12020-024-04066-x. Epub 2024 Oct 11.
6
Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures.大型语言模型在牙科手术中预防感染性心内膜炎的准确性。
Int Dent J. 2025 Feb;75(1):206-212. doi: 10.1016/j.identj.2024.09.033. Epub 2024 Oct 12.
7
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
8
Assessing AI efficacy in medical knowledge tests: A study using Taiwan's internal medicine exam questions from 2020 to 2023.评估人工智能在医学知识测试中的效能:一项使用2020年至2023年台湾内科医师考试试题的研究。
Digit Health. 2024 Oct 18;10:20552076241291404. doi: 10.1177/20552076241291404. eCollection 2024 Jan-Dec.
9
Suitability of GPT-4o as an evaluator of cardiopulmonary resuscitation skills examinations.GPT-4o 作为心肺复苏技能考试评估者的适用性。
Resuscitation. 2024 Nov;204:110404. doi: 10.1016/j.resuscitation.2024.110404. Epub 2024 Sep 28.
10
Large Language Models for Simplified Interventional Radiology Reports: A Comparative Analysis.用于简化介入放射学报告的大语言模型:一项比较分析
Acad Radiol. 2025 Feb;32(2):888-898. doi: 10.1016/j.acra.2024.09.041. Epub 2024 Sep 30.

引用本文的文献

1
Diagnostic Performance of a Large Language Model for Determining the Cause of Death: A Comparative Analysis of Clinical History, Postmortem Computed Tomography Findings, and Their Integration.用于确定死因的大语言模型的诊断性能:临床病史、尸检计算机断层扫描结果及其整合的比较分析
Cureus. 2025 May 8;17(5):e83721. doi: 10.7759/cureus.83721. eCollection 2025 May.

本文引用的文献

1
Diagnostic performances of Claude 3 Opus and Claude 3.5 Sonnet from patient history and key images in Radiology's "Diagnosis Please" cases.Claude 3 Opus 和 Claude 3.5 Sonnet 基于病史和放射科“诊断请”病例关键图像的诊断性能。
Jpn J Radiol. 2024 Dec;42(12):1399-1402. doi: 10.1007/s11604-024-01634-z. Epub 2024 Aug 3.
2
Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.GPT-4o、Claude 3 Opus 和 Gemini 1.5 Pro 在“诊断请”案例中的诊断性能。
Jpn J Radiol. 2024 Nov;42(11):1231-1235. doi: 10.1007/s11604-024-01619-y. Epub 2024 Jul 1.
3
GPTs are GPTs: Labor market impact potential of LLMs.
生成式预训练变换器(GPTs)就是生成式预训练变换器:大语言模型对劳动力市场的潜在影响
Science. 2024 Jun 21;384(6702):1306-1308. doi: 10.1126/science.adj0998. Epub 2024 Jun 20.
4
Inconsistency between Human Observation and Deep Learning Models: Assessing Validity of Postmortem Computed Tomography Diagnosis of Drowning.人类观察与深度学习模型之间的不一致性:评估死后计算机断层扫描诊断溺死的有效性。
J Imaging Inform Med. 2024 Jun;37(3):1-10. doi: 10.1007/s10278-024-00974-6. Epub 2024 Feb 9.
5
Comparing GPT-3.5 and GPT-4 Accuracy and Drift in Diagnosis Please Cases.比较GPT-3.5和GPT-4在诊断病例中的准确性和偏差。
Radiology. 2024 Jan;310(1):e232411. doi: 10.1148/radiol.232411.
6
Accuracy of ChatGPT generated diagnosis from patient's medical history and imaging findings in neuroradiology cases.ChatGPT根据患者病史和影像学检查结果对神经放射学病例进行诊断的准确性。
Neuroradiology. 2024 Jan;66(1):73-79. doi: 10.1007/s00234-023-03252-4. Epub 2023 Nov 23.
7
ChatGPT's Diagnostic Performance from Patient History and Imaging Findings on the Diagnosis Please Quizzes.ChatGPT在诊断问答中基于患者病史和影像检查结果的诊断性能。
Radiology. 2023 Jul;308(1):e231040. doi: 10.1148/radiol.231040.
8
AI in health and medicine.人工智能在医疗中的应用。
Nat Med. 2022 Jan;28(1):31-38. doi: 10.1038/s41591-021-01614-0. Epub 2022 Jan 20.
9
Postmortem CT is more accurate than clinical diagnosis for identifying the immediate cause of death in hospitalized patients: a prospective autopsy-based study.在确定住院患者的直接死因方面,尸检CT比临床诊断更准确:一项基于前瞻性尸检的研究。
Virchows Arch. 2016 Jul;469(1):101-9. doi: 10.1007/s00428-016-1937-6. Epub 2016 Apr 16.
10
Causes of death of patients with lung cancer.肺癌患者的死因。
Arch Pathol Lab Med. 2012 Dec;136(12):1552-7. doi: 10.5858/arpa.2011-0521-OA.