School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
J Am Med Inform Assoc. 2024 Sep 1;31(9):1865-1874. doi: 10.1093/jamia/ocae037.
Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical natural language processing (NLP) tasks in different languages, we present Taiyi, a bilingual fine-tuned LLM for diverse biomedical NLP tasks.
We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, these corpora were converted to the instruction data used to fine-tune the general LLM. During the supervised fine-tuning phase, a 2-stage strategy is proposed to optimize the model performance across various tasks.
Experimental results on 13 test sets, which include named entity recognition, relation extraction, text classification, and question answering tasks, demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multitasking.
Leveraging rich high-quality biomedical corpora and developing effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multitasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches using smaller language models.
大多数现有的经过微调的生物医学大型语言模型(LLM)专注于提高单语生物医学问答和对话任务的性能。为了研究经过微调的 LLM 在不同语言的多种生物医学自然语言处理(NLP)任务中的有效性,我们提出了 Taiyi,这是一种用于多种生物医学 NLP 任务的双语经过微调的 LLM。
我们首先整理了一个包含 140 个现有生物医学文本挖掘数据集的综合集合(102 个英语数据集和 38 个中文数据集),涵盖了超过 10 种任务类型。随后,这些语料库被转换为用于微调通用 LLM 的指令数据。在监督微调阶段,提出了一种 2 阶段策略来优化跨各种任务的模型性能。
在包括命名实体识别、关系提取、文本分类和问答任务在内的 13 个测试集上的实验结果表明,与通用 LLM 相比,Taiyi 实现了卓越的性能。涉及额外的生物医学 NLP 任务的案例研究进一步表明,Taiyi 在双语生物医学多任务方面具有相当大的潜力。
利用丰富的高质量生物医学语料库并开发有效的微调策略可以显著提高生物医学领域内的 LLM 性能。Taiyi 通过监督微调展示了双语多任务能力。然而,对于基于 LLM 的生成方法来说,那些本质上不是生成任务的任务,如信息提取,仍然具有挑战性,并且它们仍然不如使用较小语言模型的传统判别方法表现出色。