Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, PA.
Department of Health Information Management, University of Pittsburgh, PA.
AMIA Annu Symp Proc. 2023 Apr 29;2022:972-981. eCollection 2022.
Developing clinical natural language systems based on machine learning and deep learning is dependent on the availability of large-scale annotated clinical text datasets, most of which are time-consuming to create and not publicly available. The lack of such annotated datasets is the biggest bottleneck for the development of clinical NLP systems. Zero-Shot Learning (ZSL) refers to the use of deep learning models to classify instances from new classes of which no training data have been seen before. Prompt-based learning is an emerging ZSL technique in NLP where we define task-based templates for different tasks. In this study, we developed a novel prompt-based clinical NLP framework called HealthPrompt and applied the paradigm of prompt-based learning on clinical texts. In this technique, rather than fine-tuning a Pre-trained Language Model (PLM), the task definitions are tuned by defining a prompt template. We performed an in-depth analysis of HealthPrompt on six different PLMs in a no-training-data setting. Our experiments show that HealthPrompt could effectively capture the context of clinical texts and perform well for clinical NLP tasks without any training data.
基于机器学习和深度学习开发临床自然语言系统依赖于大规模标注的临床文本数据集的可用性,而这些数据集大多需要耗费大量时间来创建,且无法公开获取。缺乏此类标注数据集是临床自然语言处理系统发展的最大瓶颈。零样本学习(ZSL)是指使用深度学习模型对以前从未见过训练数据的新类别的实例进行分类。基于提示的学习是 NLP 中一种新兴的 ZSL 技术,我们为不同任务定义基于任务的模板。在这项研究中,我们开发了一种名为 HealthPrompt 的新型基于提示的临床自然语言处理框架,并将基于提示的学习范式应用于临床文本。在这项技术中,不是通过微调预训练语言模型(PLM),而是通过定义提示模板来调整任务定义。我们在没有训练数据的情况下,在六个不同的 PLM 上对 HealthPrompt 进行了深入分析。我们的实验表明,HealthPrompt 可以有效地捕获临床文本的上下文,并在没有任何训练数据的情况下很好地执行临床自然语言处理任务。