• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医学诊断中的大语言模型:基于文献计量分析的综述

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

作者信息

Su Hankun, Sun Yuanyuan, Li Ruiting, Zhang Aozhe, Yang Yuemeng, Xiao Fen, Duan Zhiying, Chen Jingjing, Hu Qin, Yang Tianli, Xu Bin, Zhang Qiong, Zhao Jing, Li Yanping, Li Hui

机构信息

Department of Reproductive Medicine, Xiangya Hospital Central South University, Changsha, China.

Clinical Research Center for Women's Reproductive Health in Hunan Province, Changsha, China.

出版信息

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

DOI:10.2196/72062
PMID:40489764
Abstract

BACKGROUND

The integration of large language models (LLMs) into medical diagnostics has garnered substantial attention due to their potential to enhance diagnostic accuracy, streamline clinical workflows, and address health care disparities. However, the rapid evolution of LLM research necessitates a comprehensive synthesis of their applications, challenges, and future directions.

OBJECTIVE

This scoping review aimed to provide an overview of the current state of research regarding the use of LLMs in medical diagnostics. The study sought to answer four primary subquestions, as follows: (1) Which LLMs are commonly used? (2) How are LLMs assessed in diagnosis? (3) What is the current performance of LLMs in diagnosing diseases? (4) Which medical domains are investigating the application of LLMs?

METHODS

This scoping review was conducted according to the Joanna Briggs Institute Manual for Evidence Synthesis and adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Relevant literature was searched from the Web of Science, PubMed, Embase, IEEE Xplore, and ACM Digital Library databases from 2022 to 2025. Articles were screened and selected based on predefined inclusion and exclusion criteria. Bibliometric analysis was performed using VOSviewer to identify major research clusters and trends. Data extraction included details on LLM types, application domains, and performance metrics.

RESULTS

The field is rapidly expanding, with a surge in publications after 2023. GPT-4 and its variants dominated research (70/95, 74% of studies), followed by GPT-3.5 (34/95, 36%). Key applications included disease classification (text or image-based), medical question answering, and diagnostic content generation. LLMs demonstrated high accuracy in specialties like radiology, psychiatry, and neurology but exhibited biases in race, gender, and cost predictions. Ethical concerns, including privacy risks and model hallucination, alongside regulatory fragmentation, were critical barriers to clinical adoption.

CONCLUSIONS

LLMs hold transformative potential for medical diagnostics but require rigorous validation, bias mitigation, and multimodal integration to address real-world complexities. Future research should prioritize explainable artificial intelligence frameworks, specialty-specific optimization, and international regulatory harmonization to ensure equitable and safe clinical deployment.

摘要

背景

大语言模型(LLMs)在医学诊断中的整合因其提高诊断准确性、简化临床工作流程以及解决医疗保健差距的潜力而备受关注。然而,大语言模型研究的快速发展需要对其应用、挑战和未来方向进行全面综合。

目的

本范围综述旨在概述大语言模型在医学诊断中应用的当前研究状况。该研究试图回答四个主要子问题,如下:(1)常用哪些大语言模型?(2)如何在诊断中评估大语言模型?(3)大语言模型目前在疾病诊断中的表现如何?(4)哪些医学领域正在研究大语言模型的应用?

方法

本范围综述根据乔安娜·布里格斯研究所证据综合手册进行,并遵循PRISMA-ScR(系统评价和元分析扩展的首选报告项目用于范围综述)。从2022年至2025年在科学网、PubMed、Embase、IEEE Xplore和ACM数字图书馆数据库中搜索相关文献。根据预定义的纳入和排除标准对文章进行筛选和选择。使用VOSviewer进行文献计量分析以识别主要研究集群和趋势。数据提取包括大语言模型类型、应用领域和性能指标的详细信息。

结果

该领域正在迅速扩展,2023年后出版物激增。GPT-4及其变体主导了研究(70/95,占研究的74%),其次是GPT-3.5(34/95,占36%)。关键应用包括疾病分类(基于文本或图像)、医学问答和诊断内容生成。大语言模型在放射学、精神病学和神经病学等专业中表现出较高的准确性,但在种族、性别和成本预测方面存在偏差。包括隐私风险和模型幻觉在内的伦理问题,以及监管碎片化,是临床应用的关键障碍。

结论

大语言模型在医学诊断方面具有变革潜力,但需要严格验证、减轻偏差和多模态整合以应对现实世界的复杂性。未来研究应优先考虑可解释的人工智能框架、特定专业的优化以及国际监管协调,以确保公平和安全的临床部署。

相似文献

1
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型:基于文献计量分析的综述
J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.
2
Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.大语言模型在自杀预防领域的应用:范围综述
J Med Internet Res. 2025 Jan 23;27:e63126. doi: 10.2196/63126.
3
Using Large Language Models to Enhance Exercise Recommendations and Physical Activity in Clinical and Healthy Populations: Scoping Review.利用大语言模型增强临床和健康人群的运动建议及身体活动:范围综述
JMIR Med Inform. 2025 May 27;13:e59309. doi: 10.2196/59309.
4
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
5
Large Language Model Architectures in Health Care: Scoping Review of Research Perspectives.医疗保健中的大语言模型架构:研究视角的范围综述
J Med Internet Res. 2025 Jun 19;27:e70315. doi: 10.2196/70315.
6
What is the value of routinely testing full blood count, electrolytes and urea, and pulmonary function tests before elective surgery in patients with no apparent clinical indication and in subgroups of patients with common comorbidities: a systematic review of the clinical and cost-effective literature.在没有明显临床指征的患者和常见合并症患者亚组中,在择期手术前常规检测全血细胞计数、电解质和尿素以及肺功能测试的价值:对临床和成本效益文献的系统评价。
Health Technol Assess. 2012 Dec;16(50):i-xvi, 1-159. doi: 10.3310/hta16500.
7
The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂(GLP-1 RAs)减肥效果的网状Meta分析的数量、质量及结果:一项范围综述
Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.
8
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.拓扑替康治疗卵巢癌的临床有效性和成本效益的快速系统评价。
Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280.
9
Applied use of biomechanical measurements from human tissues for the development of medical skills trainers: a scoping review.应用人体组织生物力学测量数据开发医学技能培训器的研究:范围综述。
JBI Evid Synth. 2023 Dec 1;21(12):2309-2405. doi: 10.11124/JBIES-22-00363.
10
Recent Advancements in Wearable Hydration-Monitoring Technologies: Scoping Review of Sensors, Trends, and Future Directions.可穿戴式水合监测技术的最新进展:传感器、趋势及未来方向的范围综述
JMIR Mhealth Uhealth. 2025 Jun 13;13:e60569. doi: 10.2196/60569.

引用本文的文献

1
Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.临床营养中的大语言模型:其应用、能力、局限性及潜在未来前景概述
Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.

本文引用的文献

1
Fine-Tuning Large Language Models for Specialized Use Cases.针对特定用例微调大语言模型。
Mayo Clin Proc Digit Health. 2024 Nov 29;3(1):100184. doi: 10.1016/j.mcpdig.2024.11.005. eCollection 2025 Mar.
2
Evaluating and addressing demographic disparities in medical large language models: a systematic review.评估和解决医学大语言模型中的人口统计学差异:一项系统综述。
Int J Equity Health. 2025 Feb 26;24(1):57. doi: 10.1186/s12939-025-02419-0.
3
Regulatory approaches towards AI Medical Devices: A comparative study of the United States, the European Union and China.
人工智能医疗器械的监管方法:美国、欧盟和中国的比较研究
Health Policy. 2025 Mar;153:105260. doi: 10.1016/j.healthpol.2025.105260. Epub 2025 Feb 1.
4
Large Language Models in Healthcare: A Bibliometric Analysis and Examination of Research Trends.医疗保健领域的大语言模型:文献计量分析与研究趋势考察
J Multidiscip Healthc. 2025 Jan 17;18:223-238. doi: 10.2147/JMDH.S502351. eCollection 2025.
5
Quantitative evaluation of GPT-4's performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation.GPT-4在美国和中国骨关节炎治疗指南解读及骨科病例咨询方面的性能定量评估。
BMJ Open. 2024 Dec 30;14(12):e082344. doi: 10.1136/bmjopen-2023-082344.
6
The interaction of structured data using openEHR and large Language models for clinical decision support in prostate cancer.使用openEHR结构化数据与大语言模型在前列腺癌临床决策支持中的交互。
World J Urol. 2025 Jan 13;43(1):67. doi: 10.1007/s00345-024-05423-1.
7
Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies.用于电子健康记录中准确疾病检测的大语言模型:以晶体性关节病为例。
RMD Open. 2024 Dec 20;10(4):e005003. doi: 10.1136/rmdopen-2024-005003.
8
A generalist medical language model for disease diagnosis assistance.用于疾病诊断辅助的通用医学语言模型。
Nat Med. 2025 Mar;31(3):932-942. doi: 10.1038/s41591-024-03416-6. Epub 2025 Jan 8.
9
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
10
Evaluation of the ability of large language models to self-diagnose oral diseases.评估大语言模型自我诊断口腔疾病的能力。
iScience. 2024 Nov 29;27(12):111495. doi: 10.1016/j.isci.2024.111495. eCollection 2024 Dec 20.