Suppr超能文献

MedReadMe:医学领域细粒度句子可读性的系统研究。

MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain.

作者信息

Jiang Chao, Xu Wei

机构信息

College of Computing, Georgia Institute of Technology.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2024 Nov;2024:17293-17319. doi: 10.18653/v1/2024.emnlp-main.958.

Abstract

Medical texts are notoriously challenging to read. Properly measuring their readability is the first step towards making them more accessible. In this paper, we present a systematic study on fine-grained readability measurements in the medical domain at both sentence-level and span-level. We introduce a new dataset MedReadMe, which consists of manually annotated readability ratings and fine-grained complex span annotation for 4,520 sentences, featuring two novel "Google-Easy" and "Google-Hard" categories. It supports our quantitative analysis, which covers 650 linguistic features and automatic complex word and jargon identification. Enabled by our high-quality annotation, we benchmark and improve several state-of-the-art sentence-level readability metrics for the medical domain specifically, which include unsupervised, supervised, and prompting-based methods using recently developed large language models (LLMs). Informed by our fine-grained complex span annotation, we find that adding a single feature, capturing the number of jargon spans, into existing readability formulas can significantly improve their correlation with human judgments. We will publicly release the dataset and code.

摘要

医学文本向来极难读懂。准确衡量其易读性是使其更易于理解的第一步。在本文中,我们对医学领域句子层面和跨度层面的细粒度易读性测量进行了系统研究。我们引入了一个新的数据集MedReadMe,它包含对4520个句子的人工标注易读性评分和细粒度复杂跨度标注,具有两个新颖的“谷歌易读”和“谷歌难读”类别。它支持我们的定量分析,该分析涵盖650种语言特征以及自动复杂词和行话识别。借助我们高质量的标注,我们专门对医学领域的几种最先进的句子层面易读性指标进行了基准测试和改进,其中包括使用最近开发的大语言模型(LLMs)的无监督、有监督和基于提示的方法。基于我们细粒度的复杂跨度标注,我们发现,在现有的易读性公式中添加一个捕捉行话跨度数量的单一特征,可以显著提高它们与人类判断的相关性。我们将公开发布数据集和代码。

相似文献

1
MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain.MedReadMe:医学领域细粒度句子可读性的系统研究。
Proc Conf Empir Methods Nat Lang Process. 2024 Nov;2024:17293-17319. doi: 10.18653/v1/2024.emnlp-main.958.

本文引用的文献

6
Paragraph-level Simplification of Medical Texts.医学文本的段落级简化
Proc Conf. 2021 Jun;2021:4972-4984. doi: 10.18653/v1/2021.naacl-main.395.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验