Leroy Gondy, Endicott James E, Kauchak David, Mouradi Obay, Just Melissa
Information Systems and Technology, Claremont Graduate University, Claremont, CA 91711, United States.
J Med Internet Res. 2013 Jul 31;15(7):e144. doi: 10.2196/jmir.2569.
Adequate health literacy is important for people to maintain good health and manage diseases and injuries. Educational text, either retrieved from the Internet or provided by a doctor's office, is a popular method to communicate health-related information. Unfortunately, it is difficult to write text that is easy to understand, and existing approaches, mostly the application of readability formulas, have not convincingly been shown to reduce the difficulty of text.
To develop an evidence-based writer support tool to improve perceived and actual text difficulty. To this end, we are developing and testing algorithms that automatically identify difficult sections in text and provide appropriate, easier alternatives; algorithms that effectively reduce text difficulty will be included in the support tool. This work describes the user evaluation with an independent writer of an automated simplification algorithm using term familiarity.
Term familiarity indicates how easy words are for readers and is estimated using term frequencies in the Google Web Corpus. Unfamiliar words are algorithmically identified and tagged for potential replacement. Easier alternatives consisting of synonyms, hypernyms, definitions, and semantic types are extracted from WordNet, the Unified Medical Language System (UMLS), and Wiktionary and ranked for a writer to choose from to simplify the text. We conducted a controlled user study with a representative writer who used our simplification algorithm to simplify texts. We tested the impact with representative consumers. The key independent variable of our study is lexical simplification, and we measured its effect on both perceived and actual text difficulty. Participants were recruited from Amazon's Mechanical Turk website. Perceived difficulty was measured with 1 metric, a 5-point Likert scale. Actual difficulty was measured with 3 metrics: 5 multiple-choice questions alongside each text to measure understanding, 7 multiple-choice questions without the text for learning, and 2 free recall questions for information retention.
Ninety-nine participants completed the study. We found strong beneficial effects on both perceived and actual difficulty. After simplification, the text was perceived as simpler (P<.001) with simplified text scoring 2.3 and original text 3.2 on the 5-point Likert scale (score 1: easiest). It also led to better understanding of the text (P<.001) with 11% more correct answers with simplified text (63% correct) compared to the original (52% correct). There was more learning with 18% more correct answers after reading simplified text compared to 9% more correct answers after reading the original text (P=.003). There was no significant effect on free recall.
Term familiarity is a valuable feature in simplifying text. Although the topic of the text influences the effect size, the results were convincing and consistent.
足够的健康素养对于人们保持良好健康、管理疾病和伤痛至关重要。无论是从互联网获取还是由医生办公室提供的教育文本,都是传达健康相关信息的常用方式。不幸的是,写出易懂的文本并非易事,而现有的方法(大多是应用可读性公式)并未令人信服地证明能降低文本难度。
开发一种基于证据的作者支持工具,以改善感知到的和实际的文本难度。为此,我们正在开发和测试能自动识别文本中难点部分并提供合适、更简单替代内容的算法;有效降低文本难度的算法将被纳入支持工具。这项工作描述了由一位独立作者对使用术语熟悉度的自动简化算法进行的用户评估。
术语熟悉度表明读者对词汇的掌握难易程度,通过谷歌网络语料库中的词频来估计。算法会识别不熟悉的词汇并标记以便可能的替换。从WordNet、统一医学语言系统(UMLS)和维基词典中提取由同义词、上位词、定义和语义类型组成的更简单替代词,并排序供作者选择以简化文本。我们对一位有代表性的作者进行了一项对照用户研究,该作者使用我们的简化算法来简化文本。我们还测试了对有代表性消费者的影响。我们研究的关键自变量是词汇简化,我们测量了其对感知到的和实际的文本难度的影响。参与者从亚马逊的Mechanical Turk网站招募。感知难度用一个指标衡量,即5点李克特量表。实际难度用三个指标衡量:每个文本旁边有5个多项选择题以测量理解情况,无文本时有7个多项选择题用于学习,以及2个自由回忆问题用于信息保留。
99名参与者完成了研究。我们发现对感知到的和实际的难度都有显著的有益影响。简化后,文本被认为更简单(P<.001),简化后的文本在5点李克特量表上得分为2.3,原始文本为3.2(分数1表示最容易)。它还使对文本的理解更好(P<.001),简化后的文本正确答案比原始文本多11%(简化文本为63%正确,原始文本为正确52%)。阅读简化文本后有更多学习效果,正确答案比阅读原始文本后多18%(阅读原始文本后正确答案多9%,P=.003)。对自由回忆没有显著影响。
术语熟悉度是简化文本的一个有价值的特征。尽管文本主题会影响效应大小,但结果令人信服且一致。