Suppr超能文献

使用持续预训练提高临床笔记段落分类模型的模型可迁移性

Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining.

作者信息

Zhou Weipeng, Yetisgen Meliha, Afshar Majid, Gao Yanjun, Savova Guergana, Miller Timothy A

机构信息

Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington-Seattle, Seattle, WA, USA.

Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA.

出版信息

medRxiv. 2023 Apr 24:2023.04.15.23288628. doi: 10.1101/2023.04.15.23288628.

Abstract

OBJECTIVE

The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability.

MATERIALS AND METHODS

We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added.

RESULTS

We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples.

DISCUSSION

Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods.

CONCLUSION

Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.

摘要

目的

临床记录部分的分类是在进行更细粒度的自然语言处理任务(如健康的社会决定因素提取和时间信息提取)之前的关键步骤。通常,在一个机构中实现高精度的临床记录部分分类模型在转移到另一个机构时,准确率会大幅下降。本研究的目的是开发在SOAP(“主观”、“客观”、“评估”和“计划”)框架下对临床记录部分进行分类且具有更高可转移性的方法。

材料与方法

我们通过微调基于BERT的模型来训练基线模型,并通过持续预训练提高其可转移性,包括域自适应预训练(DAPT)和任务自适应预训练(TAPT)。我们在微调期间添加了域外注释样本,并观察了不同数量注释样本大小下的模型性能。最后,我们量化了持续预训练在添加域内注释样本数量等效性方面的影响。

结果

我们发现,持续预训练只有在与域内注释样本结合时才能改进模型,将F1分数从0.756提高到0.808,在三个数据集上的平均值。这种改进相当于添加了50.2个域内注释样本。

讨论

尽管在域内执行时被认为是一项简单的任务,但即使使用基于高度复杂神经网络的方法,部分分类在跨域执行时仍然是一项相当困难的任务。

结论

在存在少量域内标记样本的情况下,持续预训练提高了跨域临床记录部分分类的模型可转移性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebcb/10168403/d17dfac4a639/nihpp-2023.04.15.23288628v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验