Suppr
超能文献

超越标记：对法语大型语言模型进行临床命名实体识别的公平评估。

Beyond Tokens: Fair Evaluation of French Large Language Models for Clinical Named Entity Recognition.

机构信息

Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland.

Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.

出版信息

Stud Health Technol Inform. 2024 Aug 22;316:666-670. doi: 10.3233/SHTI240502.

DOI:10.3233/SHTI240502

PMID:39176830

Abstract

Named Entity Recognition (NER) models based on Transformers have gained prominence for their impressive performance in various languages and domains. This work delves into the often-overlooked aspect of entity-level metrics and exposes significant discrepancies between token and entity-level evaluations. The study utilizes a corpus of synthetic French oncological reports annotated with entities representing oncological morphologies. Four different French BERT-based models are fine-tuned for token classification, and their performance is rigorously assessed at both token and entity-level. In addition to fine-tuning, we evaluate ChatGPT's ability to perform NER through prompt engineering techniques. The findings reveal a notable disparity in model effectiveness when transitioning from token to entity-level metrics, highlighting the importance of comprehensive evaluation methodologies in NER tasks. Furthermore, in comparison to BERT, ChatGPT remains limited when it comes to detecting advanced entities in French.

摘要

基于转换器的命名实体识别 (NER) 模型因其在各种语言和领域中的出色表现而备受关注。这项工作深入研究了实体级别的指标这一经常被忽视的方面，并揭示了标记和实体级别的评估之间存在显著差异。该研究使用了一个带有代表肿瘤形态的实体的合成法语肿瘤学报告语料库。我们对四个不同的基于法语 BERT 的模型进行了微调，以进行标记分类，并在标记和实体级别上对其性能进行了严格评估。除了微调，我们还通过提示工程技术评估了 ChatGPT 执行 NER 的能力。研究结果表明，从标记级别到实体级别度量标准的模型效果存在显著差异，这突出了在 NER 任务中采用全面评估方法的重要性。此外，与 BERT 相比，ChatGPT 在检测法语中的高级实体方面仍然存在局限性。