• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

定制大语言模型对罕见儿科疾病病例报告的诊断准确性

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.

作者信息

Young Cameron C, Enichen Ellie, Rivera Christian, Auger Corinne A, Grant Nathan, Rao Arya, Succi Marc D

机构信息

Harvard Medical School, Boston, Massachusetts, USA.

Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, Massachusetts, USA.

出版信息

Am J Med Genet A. 2025 Feb;197(2):e63878. doi: 10.1002/ajmg.a.63878. Epub 2024 Sep 13.

DOI:10.1002/ajmg.a.63878
PMID:39268988
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12123583/
Abstract

Accurately diagnosing rare pediatric diseases frequently represent a clinical challenge due to their complex and unusual clinical presentations. Here, we explore the capabilities of three large language models (LLMs), GPT-4, Gemini Pro, and a custom-built LLM (GPT-4 integrated with the Human Phenotype Ontology [GPT-4 HPO]), by evaluating their diagnostic performance on 61 rare pediatric disease case reports. The performance of the LLMs were assessed for accuracy in identifying specific diagnoses, listing the correct diagnosis among a differential list, and broad disease categories. In addition, GPT-4 HPO was tested on 100 general pediatrics case reports previously assessed on other LLMs to further validate its performance. The results indicated that GPT-4 was able to predict the correct diagnosis with a diagnostic accuracy of 13.1%, whereas both GPT-4 HPO and Gemini Pro had diagnostic accuracies of 8.2%. Further, GPT-4 HPO showed an improved performance compared with the other two LLMs in identifying the correct diagnosis among its differential list and the broad disease category. Although these findings underscore the potential of LLMs for diagnostic support, particularly when enhanced with domain-specific ontologies, they also stress the need for further improvement prior to integration into clinical practice.

摘要

由于罕见儿科疾病临床表现复杂且不寻常,准确诊断这些疾病常常是一项临床挑战。在此,我们通过评估三种大语言模型(LLMs),即GPT-4、Gemini Pro和一个定制的大语言模型(与人类表型本体[GPT-4 HPO]集成的GPT-4)对61例罕见儿科疾病病例报告的诊断性能,来探索它们的能力。评估了大语言模型在识别特定诊断、在鉴别诊断列表中列出正确诊断以及宽泛疾病类别方面的准确性表现。此外,在先前由其他大语言模型评估过的100例普通儿科病例报告上对GPT-4 HPO进行了测试,以进一步验证其性能。结果表明,GPT-4能够以13.1%的诊断准确率预测正确诊断,而GPT-4 HPO和Gemini Pro的诊断准确率均为8.2%。此外,在鉴别诊断列表和宽泛疾病类别中识别正确诊断方面,GPT-4 HPO与其他两个大语言模型相比表现有所改善。尽管这些发现强调了大语言模型在诊断支持方面的潜力,特别是在通过特定领域本体进行增强时,但它们也强调了在整合到临床实践之前需要进一步改进。

相似文献

1
Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.定制大语言模型对罕见儿科疾病病例报告的诊断准确性
Am J Med Genet A. 2025 Feb;197(2):e63878. doi: 10.1002/ajmg.a.63878. Epub 2024 Sep 13.
2
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
3
Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.GPT-4o、Claude 3 Opus 和 Gemini 1.5 Pro 在“诊断请”案例中的诊断性能。
Jpn J Radiol. 2024 Nov;42(11):1231-1235. doi: 10.1007/s11604-024-01619-y. Epub 2024 Jul 1.
4
An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.基于本体增强大语言模型的罕见病知识图谱构建自动端到端系统:开发研究
JMIR Med Inform. 2024 Dec 18;12:e60665. doi: 10.2196/60665.
5
Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases.GPT-4o在九种语言和4967个病例的罕见病诊断中表现一致。
medRxiv. 2025 Feb 28:2025.02.26.25322769. doi: 10.1101/2025.02.26.25322769.
6
Capabilities of GPT-4o and Gemini 1.5 Pro in Gram stain and bacterial shape identification.GPT-4o 和 Gemini 1.5 Pro 在革兰氏染色和细菌形态识别方面的能力。
Future Microbiol. 2024;19(15):1283-1292. doi: 10.1080/17460913.2024.2381967. Epub 2024 Jul 29.
7
Diagnostic Performance of Publicly Available Large Language Models in Corneal Diseases: A Comparison with Human Specialists.公开可用的大语言模型在角膜疾病中的诊断性能:与人类专家的比较
Diagnostics (Basel). 2025 May 13;15(10):1221. doi: 10.3390/diagnostics15101221.
8
Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports.通过大语言模型辅助分析加强肿瘤监测:GPT-4与Gemini在评估腹部系列CT扫描报告中的肿瘤问题方面的比较研究
Acad Radiol. 2025 May;32(5):2385-2391. doi: 10.1016/j.acra.2024.10.050. Epub 2024 Dec 9.
9
Comparing Diagnostic Accuracy of Radiologists versus GPT-4V and Gemini Pro Vision Using Image Inputs from Diagnosis Please Cases.比较放射科医生与 GPT-4V 和 Gemini Pro Vision 使用诊断请案例的图像输入的诊断准确性。
Radiology. 2024 Jul;312(1):e240273. doi: 10.1148/radiol.240273.
10
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.

引用本文的文献

1
Improving automated deep phenotyping through large language models using retrieval-augmented generation.通过使用检索增强生成的大语言模型改进自动化深度表型分析。
Genome Med. 2025 Aug 18;17(1):91. doi: 10.1186/s13073-025-01521-w.
2
Development and Evaluation of an Artificial Intelligence-Powered Surgical Oral Examination Simulator: A Pilot Study.人工智能驱动的外科口腔检查模拟器的开发与评估:一项试点研究。
Mayo Clin Proc Digit Health. 2025 Jun 9;3(3):100241. doi: 10.1016/j.mcpdig.2025.100241. eCollection 2025 Sep.
3
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型:基于文献计量分析的综述
J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.
4
Synthetic medical education in dermatology leveraging generative artificial intelligence.利用生成式人工智能的皮肤科合成医学教育。
NPJ Digit Med. 2025 May 4;8(1):247. doi: 10.1038/s41746-025-01650-x.
5
A Future of Self-Directed Patient Internet Research: Large Language Model-Based Tools Versus Standard Search Engines.自主导向的患者网络研究的未来:基于大语言模型的工具与标准搜索引擎
Ann Biomed Eng. 2025 May;53(5):1199-1208. doi: 10.1007/s10439-025-03701-6. Epub 2025 Mar 3.
6
Artificial intelligence in clinical genetics.临床遗传学中的人工智能
Eur J Hum Genet. 2025 Mar;33(3):281-288. doi: 10.1038/s41431-024-01782-w. Epub 2025 Jan 13.
7
Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval Augmented Generation.通过使用检索增强生成的大语言模型改进自动深度表型分析
medRxiv. 2024 Dec 2:2024.12.01.24318253. doi: 10.1101/2024.12.01.24318253.

本文引用的文献

1
An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.基于本体增强大语言模型的罕见病知识图谱构建自动端到端系统:开发研究
JMIR Med Inform. 2024 Dec 18;12:e60665. doi: 10.2196/60665.
2
Racial, ethnic, and sex bias in large language model opioid recommendations for pain management.大语言模型在疼痛管理的阿片类药物推荐中的种族、民族和性别偏见。
Pain. 2025 Mar 1;166(3):511-517. doi: 10.1097/j.pain.0000000000003388. Epub 2024 Sep 6.
3
A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions.ChatGPT 3.5 和 ChatGPT 4 在回答选定遗传学问题方面的比较评估。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2271-2283. doi: 10.1093/jamia/ocae128.
4
Fine-tuning large language models for rare disease concept normalization.微调大型语言模型以实现罕见病概念规范化。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. doi: 10.1093/jamia/ocae133.
5
Further Reflections on the Use of Large Language Models in Pediatrics.关于在儿科学中使用大语言模型的进一步思考。
JAMA Pediatr. 2024 Jun 1;178(6):628-629. doi: 10.1001/jamapediatrics.2024.0729.
6
Proactive Polypharmacy Management Using Large Language Models: Opportunities to Enhance Geriatric Care.使用大型语言模型进行主动药物治疗管理:改善老年护理的机会。
J Med Syst. 2024 Apr 18;48(1):41. doi: 10.1007/s10916-024-02058-y.
7
Diagnostic Accuracy of a Large Language Model in Pediatric Case Studies.大型语言模型在儿科病例研究中的诊断准确性。
JAMA Pediatr. 2024 Mar 1;178(3):313-315. doi: 10.1001/jamapediatrics.2023.5750.
8
Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care.共情与公平:医疗保健中采用大型语言模型的关键考量。
JMIR Med Educ. 2023 Dec 28;9:e51199. doi: 10.2196/51199.
9
Genetic counselors' utilization of ChatGPT in professional practice: A cross-sectional study.遗传咨询师在专业实践中对ChatGPT的应用:一项横断面研究。
Am J Med Genet A. 2024 Apr;194(4):e63493. doi: 10.1002/ajmg.a.63493. Epub 2023 Dec 8.
10
The Human Phenotype Ontology in 2024: phenotypes around the world.2024 年人类表型本体:世界各地的表型。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1333-D1346. doi: 10.1093/nar/gkad1005.