Sarker Abeed, Zhang Rui, Wang Yanshan, Xiao Yunyu, Das Sudeshna, Schutte Dalton, Oniani David, Xie Qianqian, Xu Hua
Emory University, Atlanta, GA, USA.
University of Minnesota, Minneapolis, MN, USA.
Yearb Med Inform. 2024 Aug;33(1):229-240. doi: 10.1055/s-0044-1800750. Epub 2025 Apr 8.
Large language models (LLMs) are revolutionizing the natural language pro-cessing (NLP) landscape within healthcare, prompting the need to synthesize the latest ad-vancements and their diverse medical applications. We attempt to summarize the current state of research in this rapidly evolving space.
We conducted a review of the most recent studies on biomedical NLP facilitated by LLMs, sourcing literature from PubMed, the Association for Computational Linguistics Anthology, IEEE Explore, and Google Scholar (the latter particularly for preprints). Given the ongoing exponential growth in LLM-related publications, our survey was inherently selective. We attempted to abstract key findings in terms of (i) LLMs customized for medical texts, and (ii) the type of medical text being leveraged by LLMs, namely medical literature, electronic health records (EHRs), and social media. In addition to technical details, we touch upon topics such as privacy, bias, interpretability, and equitability.
We observed that while general-purpose LLMs (e.g., GPT-4) are most popular, there is a growing trend in training or customizing open-source LLMs for specific biomedi-cal texts and tasks. Several promising open-source LLMs are currently available, and appli-cations involving EHRs and biomedical literature are more prominent relative to noisier data sources such as social media. For supervised classification and named entity recogni-tion tasks, traditional (encoder only) transformer-based models still outperform new-age LLMs, and the latter are typically suited for few-shot settings and generative tasks such as summarization. There is still a paucity of research on evaluation, bias, privacy, reproduci-bility, and equitability of LLMs.
LLMs have the potential to transform NLP tasks within the broader medical domain. While technical progress continues, biomedical application focused research must prioritize aspects not necessarily related to performance such as task-oriented evaluation, bias, and equitable use.
大语言模型(LLMs)正在彻底改变医疗保健领域的自然语言处理(NLP)格局,这促使人们需要综合最新进展及其多样的医学应用。我们试图总结这一快速发展领域的当前研究状况。
我们对由大语言模型推动的生物医学NLP的最新研究进行了综述,从PubMed、计算语言学协会文集、IEEE Xplore和谷歌学术(后者尤其用于预印本)获取文献。鉴于与大语言模型相关的出版物呈指数级持续增长,我们的调查本质上具有选择性。我们试图从以下方面提取关键发现:(i)针对医学文本定制的大语言模型,以及(ii)大语言模型所利用的医学文本类型,即医学文献、电子健康记录(EHRs)和社交媒体。除技术细节外,我们还涉及隐私、偏差、可解释性和公平性等主题。
我们观察到,虽然通用大语言模型(如GPT - 4)最受欢迎,但针对特定生物医学文本和任务训练或定制开源大语言模型的趋势正在增加。目前有几个有前景的开源大语言模型,相对于社交媒体等噪声较大的数据源,涉及电子健康记录和生物医学文献的应用更为突出。对于监督分类和命名实体识别任务,传统的(仅编码器)基于Transformer的模型仍然优于新一代大语言模型,而后者通常适用于少样本设置和诸如摘要生成等生成任务。关于大语言模型的评估、偏差、隐私、可重复性和公平性的研究仍然很少。
大语言模型有潜力在更广泛的医学领域改变自然语言处理任务。虽然技术进步仍在继续,但专注于生物医学应用的研究必须优先考虑与性能不一定相关的方面,如面向任务的评估、偏差和公平使用。