Li Zhi-Qiang, Xu Runbing, Gong Xin-Ran, Wang Cheng-Lu, Liu Jian-Ping
Centre for Evidence-based Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China.
Department of Hematology and Oncology, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China.
Digit Health. 2025 Sep 12;11:20552076251365059. doi: 10.1177/20552076251365059. eCollection 2025 Jan-Dec.
Large language models (LLMs) are revolutionizing medical research. However, there is a lack of bibliometric analysis that identifies citation trends shaping the history of this field. This study analyzes the top 100 (T100) most-cited articles on LLMs in medicine to assess their impact and characteristics.
A bibliometric analysis of top-cited articles in the Web of Science database using search terms like "LLMs, generative artificial intelligence, GPT" from 2022 to 2025. Two reviewers identified the T100 papers, extracting publication details, citations, and research themes, adhering to BIBLIO reporting guidelines.
The T100 articles had contributed from 655 authors, and 92 articles were published in 2023. Original research constituted the majority of publications (60 articles). Collectively, these works accumulated 14,847 citations, with individual citations ranging from 50 to 1057 (average 148.47). The U.S. led global contributions with 56 articles, Stanford University emerging as the most prolific institution (8 articles). The top seven journals contributed to 31% of the T100, and published the largest share (8 articles) in 70 peer-reviewed journals. The most-cited article is "Evolutionary-scale prediction of atomic-level protein structure with a language model" (Lin et al., Science 2023; 1057 citations). The research themes centered on evaluating LLMs' performance in exam-style evaluations, medical knowledge synthesis, and question-answering tasks in medicine.
This analysis provides a core overview of high-impact LLMs research in medicine, guiding future applications. The findings highlighted the remarkable progress in clinical decision support, drug discovery, multimodal medical imaging analysis, and personalized medical information-answering. They also stress the need for prospective trials to assess real-world clinical impacts, boost the reliability of LLMs-generated medical info, develop consensus-driven solutions to address ethical challenges, and launch global initiatives to democratize LLMs tools.
大语言模型正在彻底改变医学研究。然而,缺乏能识别塑造该领域历史的引文趋势的文献计量分析。本研究分析了医学领域关于大语言模型被引用次数最多的前100篇文章,以评估其影响力和特征。
使用诸如“大语言模型、生成式人工智能、GPT”等检索词,对科学网数据库中被引用次数最多的文章进行文献计量分析,时间跨度为2022年至2025年。两名评审员确定了前100篇论文,提取了出版细节、引用次数和研究主题,遵循BIBLIO报告指南。
前100篇文章由655位作者撰写,92篇文章于2023年发表。原创研究占出版物的大多数(60篇)。这些作品总共获得了14847次引用,单篇引用次数从50次到1057次不等(平均148.47次)。美国以56篇文章领先全球贡献,斯坦福大学成为产出最多的机构(8篇)。前七大期刊贡献了前100篇文章的31%,在70种同行评审期刊中发表的文章占比最大(8篇)。被引用次数最多的文章是《用语言模型进行原子级蛋白质结构的进化尺度预测》(Lin等人,《科学》,2023年;1057次引用)。研究主题集中在评估大语言模型在考试式评估、医学知识综合以及医学问答任务中的表现。
本分析提供了医学领域高影响力大语言模型研究的核心概述,为未来应用提供指导。研究结果突出了临床决策支持、药物发现、多模态医学影像分析和个性化医学信息问答方面的显著进展。它们还强调需要进行前瞻性试验,以评估实际临床影响,提高大语言模型生成的医学信息的可靠性,制定以共识为驱动的解决方案来应对伦理挑战,并发起全球倡议以使大语言模型工具民主化。