Suppr超能文献

用于自动评估自由文本答案的自然语言处理——基于欧洲放射学文凭考试的可行性研究

Natural language processing for automatic evaluation of free-text answers - a feasibility study based on the European Diploma in Radiology examination.

作者信息

Stoehr Fabian, Kämpgen Benedikt, Müller Lukas, Zufiría Laura Oleaga, Junquero Vanesa, Merino Cristina, Mildenberger Peter, Kloeckner Roman

机构信息

Department of Diagnostic and Interventional Radiology, University Medical Center, Johannes Gutenberg-University Mainz, Langenbeckst, 1, 55131, Mainz, Germany.

Empolis Information Management GmbH, Leightonstraße 2, 97074, Würzburg, Germany.

出版信息

Insights Imaging. 2023 Sep 19;14(1):150. doi: 10.1186/s13244-023-01507-5.

Abstract

BACKGROUND

Written medical examinations consist of multiple-choice questions and/or free-text answers. The latter require manual evaluation and rating, which is time-consuming and potentially error-prone. We tested whether natural language processing (NLP) can be used to automatically analyze free-text answers to support the review process.

METHODS

The European Board of Radiology of the European Society of Radiology provided representative datasets comprising sample questions, answer keys, participant answers, and reviewer markings from European Diploma in Radiology examinations. Three free-text questions with the highest number of corresponding answers were selected: Questions 1 and 2 were "unstructured" and required a typical free-text answer whereas question 3 was "structured" and offered a selection of predefined wordings/phrases for participants to use in their free-text answer. The NLP engine was designed using word lists, rule-based synonyms, and decision tree learning based on the answer keys and its performance tested against the gold standard of reviewer markings.

RESULTS

After implementing the NLP approach in Python, F1 scores were calculated as a measure of NLP performance: 0.26 (unstructured question 1, n = 96), 0.33 (unstructured question 2, n = 327), and 0.5 (more structured question, n = 111). The respective precision/recall values were 0.26/0.27, 0.4/0.32, and 0.62/0.55.

CONCLUSION

This study showed the successful design of an NLP-based approach for automatic evaluation of free-text answers in the EDiR examination. Thus, as a future field of application, NLP could work as a decision-support system for reviewers and support the design of examinations being adjusted to the requirements of an automated, NLP-based review process.

CLINICAL RELEVANCE STATEMENT

Natural language processing can be successfully used to automatically evaluate free-text answers, performing better with more structured question-answer formats. Furthermore, this study provides a baseline for further work applying, e.g., more elaborated NLP approaches/large language models.

KEY POINTS

• Free-text answers require manual evaluation, which is time-consuming and potentially error-prone. • We developed a simple NLP-based approach - requiring only minimal effort/modeling - to automatically analyze and mark free-text answers. • Our NLP engine has the potential to support the manual evaluation process. • NLP performance is better on a more structured question-answer format.

摘要

背景

书面医学考试包括多项选择题和/或自由文本答案。后者需要人工评估和评分,既耗时又可能容易出错。我们测试了自然语言处理(NLP)是否可用于自动分析自由文本答案,以支持审核过程。

方法

欧洲放射学会的欧洲放射学委员会提供了代表性数据集,包括来自欧洲放射学文凭考试的样题、答案、考生答案和审核员标记。选择了对应答案数量最多的三道自由文本问题:问题1和问题2是“非结构化”的,需要典型的自由文本答案,而问题3是“结构化”的,为考生提供了一系列预定义的措辞/短语以供其在自由文本答案中使用。基于答案设计了使用单词列表、基于规则的同义词和决策树学习的NLP引擎,并对照审核员标记的金标准测试其性能。

结果

在Python中实施NLP方法后,计算F1分数作为NLP性能的指标:0.26(非结构化问题1,n = 96)、0.33(非结构化问题2,n = 327)和0.5(结构化程度更高的问题,n = 111)。各自的精确率/召回率值分别为0.26/0.27、0.4/0.32和0.62/0.55。

结论

本研究表明成功设计了一种基于NLP的方法,用于自动评估欧洲放射学文凭考试中的自由文本答案。因此,作为未来的应用领域,NLP可作为审核员的决策支持系统,并支持根据基于NLP的自动化审核过程的要求调整考试设计。

临床相关性声明

自然语言处理可成功用于自动评估自由文本答案,在结构化程度更高的问答格式中表现更好。此外,本研究为进一步应用(例如更精细的NLP方法/大语言模型)的工作提供了基线。

关键点

• 自由文本答案需要人工评估,既耗时又可能容易出错。• 我们开发了一种简单的基于NLP的方法——只需最少的工作量/建模——来自动分析和标记自由文本答案。• 我们的NLP引擎有潜力支持人工评估过程。• NLP在结构化程度更高的问答格式上性能更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/bf8cbcf88210/13244_2023_1507_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验