Arabzadeh Negar, Bagheri Ebrahim
University of Waterloo, Waterloo, ON, Canada.
Toronto Metropolitan University, Toronto, ON, Canada.
J Biomed Inform. 2023 Oct;146:104486. doi: 10.1016/j.jbi.2023.104486. Epub 2023 Sep 16.
Large neural-based Pre-trained Language Models (PLM) have recently gained much attention due to their noteworthy performance in many downstream Information Retrieval (IR) and Natural Language Processing (NLP) tasks. PLMs can be categorized as either general-purpose, which are trained on resources such as large-scale Web corpora, and domain-specific which are trained on in-domain or mixed-domain corpora. While domain-specific PLMs have shown promising performance on domain-specific tasks, they are significantly more computationally expensive compared to general-purpose PLMs as they have to be either retrained or trained from scratch. The objective of our work in this paper is to explore whether it would be possible to leverage general-purpose PLMs to show competitive performance to domain-specific PLMs without the need for expensive retraining of the PLMs for domain-specific tasks. By focusing specifically on the recent BioASQ Biomedical Question Answering task, we show how different general-purpose PLMs show synergistic behaviour in terms of performance, which can lead to overall notable performance improvement when used in tandem with each other. More concretely, given a set of general-purpose PLMs, we propose a self-supervised method for training a classifier that systematically selects the PLM that is most likely to answer the question correctly on a per-input basis. We show that through such a selection strategy, the performance of general-purpose PLMs can become competitive with domain-specific PLMs while remaining computationally light since there is no need to retrain the large language model itself. We run experiments on the BioASQ dataset, which is a large-scale biomedical question-answering benchmark. We show that utilizing our proposed selection strategy can show statistically significant performance improvements on general-purpose language models with an average of 16.7% when using only lighter models such as DistilBERT and DistilRoBERTa, as well as 14.2% improvement when using relatively larger models such as BERT and RoBERTa and so, their performance become competitive with domain-specific large language models such as PubMedBERT.
基于神经网络的大型预训练语言模型(PLM)最近备受关注,因为它们在许多下游信息检索(IR)和自然语言处理(NLP)任务中表现出色。PLM 可分为通用型和特定领域型,通用型是在大规模网络语料库等资源上进行训练,特定领域型则是在领域内或混合领域语料库上进行训练。虽然特定领域的 PLM 在特定领域任务中表现出了有前景的性能,但与通用型 PLM 相比,它们的计算成本要高得多,因为它们必须重新训练或从头开始训练。本文我们工作的目标是探索是否有可能利用通用型 PLM 来展现出与特定领域 PLM 相竞争的性能,而无需针对特定领域任务对 PLM 进行昂贵的重新训练。通过特别关注最近的生物医学问答任务 BioASQ,我们展示了不同的通用型 PLM 在性能方面如何表现出协同行为,当它们相互配合使用时,可以带来整体显著的性能提升。更具体地说,给定一组通用型 PLM,我们提出了一种自监督方法来训练一个分类器,该分类器会根据每个输入系统地选择最有可能正确回答问题的 PLM。我们表明,通过这种选择策略,通用型 PLM 的性能可以与特定领域的 PLM 相竞争,同时由于无需重新训练大语言模型本身,计算负担仍然较轻。我们在 BioASQ 数据集上进行了实验,该数据集是一个大规模的生物医学问答基准。我们表明,利用我们提出的选择策略,可以在通用语言模型上显示出具有统计学意义的性能提升,仅使用 DistilBERT 和 DistilRoBERTa 等较轻模型时平均提升 16.7%,使用 BERT 和 RoBERTa 等相对较大模型时提升 14.2%,因此,它们的性能与 PubMedBERT 等特定领域的大语言模型相竞争。