Jiang Chao, Xu Wei
College of Computing, Georgia Institute of Technology.
Proc Conf Empir Methods Nat Lang Process. 2024 Nov;2024:17293-17319. doi: 10.18653/v1/2024.emnlp-main.958.
Medical texts are notoriously challenging to read. Properly measuring their readability is the first step towards making them more accessible. In this paper, we present a systematic study on fine-grained readability measurements in the medical domain at both sentence-level and span-level. We introduce a new dataset MedReadMe, which consists of manually annotated readability ratings and fine-grained complex span annotation for 4,520 sentences, featuring two novel "Google-Easy" and "Google-Hard" categories. It supports our quantitative analysis, which covers 650 linguistic features and automatic complex word and jargon identification. Enabled by our high-quality annotation, we benchmark and improve several state-of-the-art sentence-level readability metrics for the medical domain specifically, which include unsupervised, supervised, and prompting-based methods using recently developed large language models (LLMs). Informed by our fine-grained complex span annotation, we find that adding a single feature, capturing the number of jargon spans, into existing readability formulas can significantly improve their correlation with human judgments. We will publicly release the dataset and code.
医学文本向来极难读懂。准确衡量其易读性是使其更易于理解的第一步。在本文中,我们对医学领域句子层面和跨度层面的细粒度易读性测量进行了系统研究。我们引入了一个新的数据集MedReadMe,它包含对4520个句子的人工标注易读性评分和细粒度复杂跨度标注,具有两个新颖的“谷歌易读”和“谷歌难读”类别。它支持我们的定量分析,该分析涵盖650种语言特征以及自动复杂词和行话识别。借助我们高质量的标注,我们专门对医学领域的几种最先进的句子层面易读性指标进行了基准测试和改进,其中包括使用最近开发的大语言模型(LLMs)的无监督、有监督和基于提示的方法。基于我们细粒度的复杂跨度标注,我们发现,在现有的易读性公式中添加一个捕捉行话跨度数量的单一特征,可以显著提高它们与人类判断的相关性。我们将公开发布数据集和代码。