探讨改进 BERT 模型在生物医学关系抽取中的预训练和微调。

Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.

机构信息

Department of Computer and Information Science, Biomedical Text Mining Lab, University of Delaware, Newark, USA.

出版信息

BMC Bioinformatics. 2022 Apr 4;23(1):120. doi: 10.1186/s12859-022-04642-w.

DOI:10.1186/s12859-022-04642-w

PMID:35379166

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8978438/

Abstract

BACKGROUND

Recently, automatically extracting biomedical relations has been a significant subject in biomedical research due to the rapid growth of biomedical literature. Since the adaptation to the biomedical domain, the transformer-based BERT models have produced leading results on many biomedical natural language processing tasks. In this work, we will explore the approaches to improve the BERT model for relation extraction tasks in both the pre-training and fine-tuning stages of its applications. In the pre-training stage, we add another level of BERT adaptation on sub-domain data to bridge the gap between domain knowledge and task-specific knowledge. Also, we propose methods to incorporate the ignored knowledge in the last layer of BERT to improve its fine-tuning.

RESULTS

The experiment results demonstrate that our approaches for pre-training and fine-tuning can improve the BERT model performance. After combining the two proposed techniques, our approach outperforms the original BERT models with averaged F1 score improvement of 2.1% on relation extraction tasks. Moreover, our approach achieves state-of-the-art performance on three relation extraction benchmark datasets.

CONCLUSIONS

The extra pre-training step on sub-domain data can help the BERT model generalization on specific tasks, and our proposed fine-tuning mechanism could utilize the knowledge in the last layer of BERT to boost the model performance. Furthermore, the combination of these two approaches further improves the performance of BERT model on the relation extraction tasks.

摘要

背景

由于生物医学文献的快速增长，最近自动提取生物医学关系已成为生物医学研究中的一个重要课题。自从适应生物医学领域以来，基于转换器的 BERT 模型在许多生物医学自然语言处理任务上取得了领先的结果。在这项工作中，我们将探索改进 BERT 模型的方法，以提高其在应用的预训练和微调阶段的关系提取任务的性能。在预训练阶段，我们在子域数据上添加了另一层 BERT 适配，以弥合领域知识和特定于任务的知识之间的差距。此外，我们提出了在 BERT 的最后一层中纳入忽略知识的方法，以提高其微调效果。

结果

实验结果表明，我们的预训练和微调方法可以提高 BERT 模型的性能。在结合了这两种提出的技术后，我们的方法在关系提取任务上的平均 F1 得分提高了 2.1%，优于原始的 BERT 模型。此外，我们的方法在三个关系提取基准数据集上达到了最新的性能。

结论

在子域数据上进行额外的预训练步骤可以帮助 BERT 模型在特定任务上进行泛化，我们提出的微调机制可以利用 BERT 最后一层的知识来提高模型性能。此外，这两种方法的结合进一步提高了 BERT 模型在关系提取任务上的性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

探讨改进 BERT 模型在生物医学关系抽取中的预训练和微调。

Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

探讨改进 BERT 模型在生物医学关系抽取中的预训练和微调。

Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献