• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于自动评估自由文本答案的自然语言处理——基于欧洲放射学文凭考试的可行性研究

Natural language processing for automatic evaluation of free-text answers - a feasibility study based on the European Diploma in Radiology examination.

作者信息

Stoehr Fabian, Kämpgen Benedikt, Müller Lukas, Zufiría Laura Oleaga, Junquero Vanesa, Merino Cristina, Mildenberger Peter, Kloeckner Roman

机构信息

Department of Diagnostic and Interventional Radiology, University Medical Center, Johannes Gutenberg-University Mainz, Langenbeckst, 1, 55131, Mainz, Germany.

Empolis Information Management GmbH, Leightonstraße 2, 97074, Würzburg, Germany.

出版信息

Insights Imaging. 2023 Sep 19;14(1):150. doi: 10.1186/s13244-023-01507-5.

DOI:10.1186/s13244-023-01507-5
PMID:37726485
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10509084/
Abstract

BACKGROUND

Written medical examinations consist of multiple-choice questions and/or free-text answers. The latter require manual evaluation and rating, which is time-consuming and potentially error-prone. We tested whether natural language processing (NLP) can be used to automatically analyze free-text answers to support the review process.

METHODS

The European Board of Radiology of the European Society of Radiology provided representative datasets comprising sample questions, answer keys, participant answers, and reviewer markings from European Diploma in Radiology examinations. Three free-text questions with the highest number of corresponding answers were selected: Questions 1 and 2 were "unstructured" and required a typical free-text answer whereas question 3 was "structured" and offered a selection of predefined wordings/phrases for participants to use in their free-text answer. The NLP engine was designed using word lists, rule-based synonyms, and decision tree learning based on the answer keys and its performance tested against the gold standard of reviewer markings.

RESULTS

After implementing the NLP approach in Python, F1 scores were calculated as a measure of NLP performance: 0.26 (unstructured question 1, n = 96), 0.33 (unstructured question 2, n = 327), and 0.5 (more structured question, n = 111). The respective precision/recall values were 0.26/0.27, 0.4/0.32, and 0.62/0.55.

CONCLUSION

This study showed the successful design of an NLP-based approach for automatic evaluation of free-text answers in the EDiR examination. Thus, as a future field of application, NLP could work as a decision-support system for reviewers and support the design of examinations being adjusted to the requirements of an automated, NLP-based review process.

CLINICAL RELEVANCE STATEMENT

Natural language processing can be successfully used to automatically evaluate free-text answers, performing better with more structured question-answer formats. Furthermore, this study provides a baseline for further work applying, e.g., more elaborated NLP approaches/large language models.

KEY POINTS

• Free-text answers require manual evaluation, which is time-consuming and potentially error-prone. • We developed a simple NLP-based approach - requiring only minimal effort/modeling - to automatically analyze and mark free-text answers. • Our NLP engine has the potential to support the manual evaluation process. • NLP performance is better on a more structured question-answer format.

摘要

背景

书面医学考试包括多项选择题和/或自由文本答案。后者需要人工评估和评分,既耗时又可能容易出错。我们测试了自然语言处理(NLP)是否可用于自动分析自由文本答案,以支持审核过程。

方法

欧洲放射学会的欧洲放射学委员会提供了代表性数据集,包括来自欧洲放射学文凭考试的样题、答案、考生答案和审核员标记。选择了对应答案数量最多的三道自由文本问题:问题1和问题2是“非结构化”的,需要典型的自由文本答案,而问题3是“结构化”的,为考生提供了一系列预定义的措辞/短语以供其在自由文本答案中使用。基于答案设计了使用单词列表、基于规则的同义词和决策树学习的NLP引擎,并对照审核员标记的金标准测试其性能。

结果

在Python中实施NLP方法后,计算F1分数作为NLP性能的指标:0.26(非结构化问题1,n = 96)、0.33(非结构化问题2,n = 327)和0.5(结构化程度更高的问题,n = 111)。各自的精确率/召回率值分别为0.26/0.27、0.4/0.32和0.62/0.55。

结论

本研究表明成功设计了一种基于NLP的方法,用于自动评估欧洲放射学文凭考试中的自由文本答案。因此,作为未来的应用领域,NLP可作为审核员的决策支持系统,并支持根据基于NLP的自动化审核过程的要求调整考试设计。

临床相关性声明

自然语言处理可成功用于自动评估自由文本答案,在结构化程度更高的问答格式中表现更好。此外,本研究为进一步应用(例如更精细的NLP方法/大语言模型)的工作提供了基线。

关键点

• 自由文本答案需要人工评估,既耗时又可能容易出错。• 我们开发了一种简单的基于NLP的方法——只需最少的工作量/建模——来自动分析和标记自由文本答案。• 我们的NLP引擎有潜力支持人工评估过程。• NLP在结构化程度更高的问答格式上性能更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/d4d4e12a7ce5/13244_2023_1507_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/bf8cbcf88210/13244_2023_1507_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/551265d83453/13244_2023_1507_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/37b2e2262adc/13244_2023_1507_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/d4d4e12a7ce5/13244_2023_1507_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/bf8cbcf88210/13244_2023_1507_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/551265d83453/13244_2023_1507_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/37b2e2262adc/13244_2023_1507_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a2/10509084/d4d4e12a7ce5/13244_2023_1507_Fig4_HTML.jpg

相似文献

1
Natural language processing for automatic evaluation of free-text answers - a feasibility study based on the European Diploma in Radiology examination.用于自动评估自由文本答案的自然语言处理——基于欧洲放射学文凭考试的可行性研究
Insights Imaging. 2023 Sep 19;14(1):150. doi: 10.1186/s13244-023-01507-5.
2
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
3
Reshaping free-text radiology notes into structured reports with generative question answering transformers.利用生成式问答变换模型将自由文本放射学报告改造成结构化报告。
Artif Intell Med. 2024 Aug;154:102924. doi: 10.1016/j.artmed.2024.102924. Epub 2024 Jun 26.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.自动检测在线社区文本自然语言处理工具中的故障。
J Med Internet Res. 2015 Aug 31;17(8):e212. doi: 10.2196/jmir.4612.
6
Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing.基于自然语言处理的乳腺磁共振成像报告中成像观察和评估类别的自动提取。
Chin Med J (Engl). 2019 Jul 20;132(14):1673-1680. doi: 10.1097/CM9.0000000000000301.
7
A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告(CancerBERT 网络)中提取数据的问答系统:开发研究。
J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.
8
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.使用自然语言处理方法从自由文本和非结构化患者生成的健康数据中提取医学信息:基于真实世界数据的可行性研究
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.
9
Natural Language Processing in Radiology: Update on Clinical Applications.自然语言处理在放射学中的应用:临床应用的更新。
J Am Coll Radiol. 2022 Nov;19(11):1271-1285. doi: 10.1016/j.jacr.2022.06.016. Epub 2022 Aug 25.
10
Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.Ascle-A 是一个用于医疗文本生成的 Python 自然语言处理工具包:开发和评估研究。
J Med Internet Res. 2024 Oct 3;26:e60601. doi: 10.2196/60601.

引用本文的文献

1
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency.五个解决欧洲放射学文凭(EDiR)基于文本问题的先进聊天机器人:性能和一致性的差异。
Eur Radiol Exp. 2025 Aug 19;9(1):79. doi: 10.1186/s41747-025-00591-0.

本文引用的文献

1
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
2
Practical Guide to Natural Language Processing for Radiology.实用放射医学自然语言处理指南。
Radiographics. 2021 Sep-Oct;41(5):1446-1453. doi: 10.1148/rg.2021200113.
3
A Natural Language Processing-Based Virtual Patient Simulator and Intelligent Tutoring System for the Clinical Diagnostic Process: Simulator Development and Case Study.
一种基于自然语言处理的临床诊断过程虚拟患者模拟器和智能辅导系统:模拟器开发与案例研究
JMIR Med Inform. 2021 Apr 9;9(4):e24073. doi: 10.2196/24073.
4
A Hybrid Reporting Platform for Extended RadLex Coding Combining Structured Reporting Templates and Natural Language Processing.一种结合结构化报告模板和自然语言处理的扩展 RadLex 编码混合报告平台。
J Digit Imaging. 2020 Aug;33(4):1026-1033. doi: 10.1007/s10278-020-00342-0.
5
Towards data-driven medical imaging using natural language processing in patients with suspected urolithiasis.利用自然语言处理技术对疑似尿路结石患者进行数据驱动的医学成像。
Int J Med Inform. 2020 May;137:104106. doi: 10.1016/j.ijmedinf.2020.104106. Epub 2020 Feb 29.
6
Patients don't come with multiple choice options: essay-based assessment in UME.医学生学业评估中的论述题考试:患者没有多项选择。
Med Educ Online. 2019 Dec;24(1):1649959. doi: 10.1080/10872981.2019.1649959.
7
An interpretable natural language processing system for written medical examination assessment.用于书面医学检查评估的可解释自然语言处理系统。
J Biomed Inform. 2019 Oct;98:103268. doi: 10.1016/j.jbi.2019.103268. Epub 2019 Aug 14.
8
Use of Natural Language Processing Tools to Identify and Classify Periprosthetic Femur Fractures.使用自然语言处理工具识别和分类股骨假体周围骨折。
J Arthroplasty. 2019 Oct;34(10):2216-2219. doi: 10.1016/j.arth.2019.07.025. Epub 2019 Jul 24.
9
Prostate Imaging Reporting and Data System Version 2.1: 2019 Update of Prostate Imaging Reporting and Data System Version 2.前列腺影像报告和数据系统第 2.1 版:前列腺影像报告和数据系统第 2 版 2019 年更新。
Eur Urol. 2019 Sep;76(3):340-351. doi: 10.1016/j.eururo.2019.02.033. Epub 2019 Mar 18.
10
Big data, artificial intelligence, and structured reporting.大数据、人工智能与结构化报告。
Eur Radiol Exp. 2018 Dec 5;2(1):42. doi: 10.1186/s41747-018-0071-4.