School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.
Department of Periodontics and Dental Hygiene, The University of Texas Health Science Center at Houston School of Dentistry, Houston, Texas, USA.
AMIA Annu Symp Proc. 2024 Jan 11;2023:904-912. eCollection 2023.
This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher numbers of examples in prompt achieved the best results with a F1 score of 0.72. The performance revealed consistency, 0.92-0.97 in the F1 score, in all settings after training with the RoBERTa model. The study highlighted the importance of seed quality rather than quantity in feeding NER models. This research reports on an efficient and accurate way to mine clinical notes for periodontal diagnoses, allowing researchers to easily and quickly build a NER model with the prompt generation approach.
本研究探索了提示生成在命名实体识别 (NER) 任务中的可用性,以及提示在不同设置下的性能。利用 GPT-J 模型生成提示,直接对黄金标准进行测试,并生成种子,进一步使用 spaCy 包将其输入到 RoBERTa 模型中。在直接测试中,具有较高数量正例和较低数量负例的提示比达到最佳效果,F1 得分为 0.72。在使用 RoBERTa 模型进行训练后,所有设置中的性能都表现出一致性,F1 得分在 0.92-0.97 之间。本研究强调了在为 NER 模型提供种子时,质量比数量更重要。本研究报告了一种从临床记录中挖掘牙周病诊断的高效、准确方法,允许研究人员使用提示生成方法轻松、快速地构建 NER 模型。