• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型在罕见病识别方面的比较分析。

Comparative analysis of large language models on rare disease identification.

作者信息

Ao Guangyu, Chen Min, Li Jing, Nie Huibing, Zhang Lei, Chen Zejun

机构信息

Department of Nephrology, Chengdu First People's Hospital, No.18 Wanxiang North Road, High-tech District, Chengdu, 610095, Sichuan, China.

Sichuan Provincial Geriatrics Clinical Medical Research Center, Chengdu, China.

出版信息

Orphanet J Rare Dis. 2025 Apr 1;20(1):150. doi: 10.1186/s13023-025-03656-w.

DOI:10.1186/s13023-025-03656-w
PMID:40165285
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11959745/
Abstract

Diagnosing rare diseases is challenging due to their low prevalence, diverse presentations, and limited recognition, often leading to diagnostic delays and errors. This study evaluates the effectiveness of multiple large language models (LLMs) in identifying rare diseases, comparing their performance with that of human physicians using real clinical cases. We analyzed 152 rare disease cases from the Chinese Medical Case Repository using four LLMs: ChatGPT-4o, Claude 3.5 Sonnet, Gemini Advanced, and Llama 3.1 405B. Overall, the LLMs performed better than human physicians, and Claude 3.5 Sonnet achieved the highest accuracy at 78.9%, significantly surpassing the accuracy of human physicians, which was 26.3%. These findings suggest that LLMs can improve rare disease diagnosis and serve as valuable tools in clinical settings, particularly in regions with limited resources. However, further validation and careful consideration of ethical and privacy issues are necessary for their effective integration into medical practice.

摘要

由于罕见病的患病率低、临床表现多样且认知有限,对其进行诊断具有挑战性,这常常导致诊断延迟和错误。本研究评估了多个大语言模型(LLMs)在识别罕见病方面的有效性,并使用真实临床病例将它们的表现与人类医生的表现进行比较。我们使用四个大语言模型:ChatGPT-4o、Claude 3.5 Sonnet、Gemini Advanced和Llama 3.1 405B,分析了来自中国医学病例库的152例罕见病病例。总体而言,大语言模型的表现优于人类医生,Claude 3.5 Sonnet的准确率最高,为78.9%,显著超过人类医生26.3%的准确率。这些发现表明,大语言模型可以改善罕见病诊断,并在临床环境中作为有价值的工具,特别是在资源有限的地区。然而,为了将它们有效整合到医疗实践中,需要进一步验证并仔细考虑伦理和隐私问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82f6/11959745/b36e300e4cd1/13023_2025_3656_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82f6/11959745/034b5537eb8d/13023_2025_3656_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82f6/11959745/b36e300e4cd1/13023_2025_3656_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82f6/11959745/034b5537eb8d/13023_2025_3656_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82f6/11959745/b36e300e4cd1/13023_2025_3656_Fig2_HTML.jpg

相似文献

1
Comparative analysis of large language models on rare disease identification.大型语言模型在罕见病识别方面的比较分析。
Orphanet J Rare Dis. 2025 Apr 1;20(1):150. doi: 10.1186/s13023-025-03656-w.
2
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
3
Enhancing the Accuracy of Human Phenotype Ontology Identification: Comparative Evaluation of Multimodal Large Language Models.提高人类表型本体识别的准确性:多模态大语言模型的比较评估
J Med Internet Res. 2025 Jun 2;27:e73233. doi: 10.2196/73233.
4
Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.大型语言模型根据儿科病例的临床表现和影像学检查结果生成鉴别诊断的准确性。
Pediatr Radiol. 2025 Jul 12. doi: 10.1007/s00247-025-06317-z.
5
Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉:言语病理学中(不)负责任地使用ChatGPT的挑战。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.
6
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers.大型语言模型在数值与语义医学知识方面的表现:基于循证问答的横断面基准研究
J Med Internet Res. 2025 Jul 14;27:e64452. doi: 10.2196/64452.
7
Comparing large language models for antibiotic prescribing in different clinical scenarios: which performs better?比较不同临床场景下用于抗生素处方的大语言模型:哪种表现更佳?
Clin Microbiol Infect. 2025 Aug;31(8):1336-1342. doi: 10.1016/j.cmi.2025.03.002. Epub 2025 Mar 19.
8
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
9
Performance of Large Language Models in the Non-English Context: Qualitative Study of Models Trained on Different Languages in Chinese Medical Examinations.大语言模型在非英语环境中的表现:对在中国医学考试中使用不同语言训练的模型的定性研究
JMIR Med Inform. 2025 Jun 27;13:e69485. doi: 10.2196/69485.
10
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究
J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.

引用本文的文献

1
AI-Based Facial Phenotyping Supports a Shared Molecular Axis in -, -, and -Related Syndromes.基于人工智能的面部表型分析支持在与-、-和-相关综合征中的共享分子轴。
Int J Mol Sci. 2025 Aug 18;26(16):7964. doi: 10.3390/ijms26167964.
2
The Artificial Intelligence-Assisted Diagnosis of Skeletal Dysplasias in Pediatric Patients: A Comparative Benchmark Study of Large Language Models and a Clinical Expert Group.儿科患者骨骼发育异常的人工智能辅助诊断:大语言模型与临床专家组的比较基准研究
Genes (Basel). 2025 Jun 28;16(7):762. doi: 10.3390/genes16070762.

本文引用的文献

1
Comparative Analysis of Multimodal Large Language Model Performance on Clinical Vignette Questions.多模态大语言模型在临床病例问题上的性能比较分析
JAMA. 2024 Apr 16;331(15):1320-1321. doi: 10.1001/jama.2023.27861.
2
Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks.系统分析 ChatGPT、Google 搜索和 Llama 2 在临床决策支持任务中的应用。
Nat Commun. 2024 Mar 6;15(1):2050. doi: 10.1038/s41467-024-46411-8.
3
Rare and complex diseases in focus: ChatGPT's role in improving diagnosis and treatment.
聚焦罕见病和复杂疾病:ChatGPT在改善诊断和治疗方面的作用。
Front Artif Intell. 2024 Jan 11;7:1338433. doi: 10.3389/frai.2024.1338433. eCollection 2024.
4
Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis.评估不同大型语言模型在泌尿系结石健康咨询和患者教育中的表现。
J Med Syst. 2023 Nov 24;47(1):125. doi: 10.1007/s10916-023-02021-3.
5
Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用:比较研究。
J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.
6
Harnessing large language models (LLMs) for candidate gene prioritization and selection.利用大型语言模型(LLMs)进行候选基因优先级排序和选择。
J Transl Med. 2023 Oct 16;21(1):728. doi: 10.1186/s12967-023-04576-8.
7
Artificial intelligence in rare disease diagnosis and treatment.人工智能在罕见病诊断和治疗中的应用。
Clin Transl Sci. 2023 Nov;16(11):2106-2111. doi: 10.1111/cts.13619. Epub 2023 Aug 30.
8
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
9
AI in health and medicine.人工智能在医疗中的应用。
Nat Med. 2022 Jan;28(1):31-38. doi: 10.1038/s41591-021-01614-0. Epub 2022 Jan 20.
10
Artificial Intelligence in Medicine: Chances and Challenges for Wide Clinical Adoption.医学中的人工智能:广泛临床应用的机遇与挑战
Visc Med. 2020 Dec;36(6):443-449. doi: 10.1159/000511930. Epub 2020 Oct 12.