• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用句法树频率测量文本难度

Measuring Text Difficulty Using Parse-Tree Frequency.

作者信息

Kauchak David, Leroy Gondy, Hogue Alan

机构信息

Computer Science Department, Pomona College, Claremont, CA.

Department of Management Information Systems, Eller College of Management, University of Arizona, Tucson, AZ.

出版信息

J Assoc Inf Sci Technol. 2017 Sep;68(9):2088-2100. doi: 10.1002/asi.23855. Epub 2017 Jun 20.

DOI:10.1002/asi.23855
PMID:29057293
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5644354/
Abstract

Text simplification often relies on dated, unproven readability formulas. As an alternative and motivated by the success of term familiarity, we test a complementary measure: grammar familiarity. Grammar familiarity is measured as the frequency of the 3 level sentence parse tree and is useful for evaluating individual sentences. We created a database of 140K unique 3 level parse structures by parsing and binning all 5.4M sentences in English Wikipedia. We then calculated the grammar frequencies across the corpus and created 11 frequency bins. We evaluate the measure with a user study and corpus analysis. For the user study, we selected 20 sentences randomly from each bin, controlling for sentence length and term frequency, and recruited 30 readers per sentence (N=6,600) on Amazon Mechanical Turk. We measured actual difficulty (comprehension) using a Cloze test, perceived difficulty using a 5-point Likert scale, and time taken. Sentences with more frequent grammatical structures, even with very different surface presentations, were easier to understand, perceived as easier and took less time to read. Outcomes from readability formulas correlated with perceived but not with actual difficulty. Our corpus analysis shows how the metric can be used to understand grammar regularity in a broad range of corpora.

摘要

文本简化通常依赖于过时的、未经证实的可读性公式。作为一种替代方法,并受词汇熟悉度成功的启发,我们测试了一种补充指标:语法熟悉度。语法熟悉度通过三级句子解析树的频率来衡量,有助于评估单个句子。我们通过对英文维基百科中所有540万个句子进行解析和分类,创建了一个包含14万个独特三级解析结构的数据库。然后,我们计算了整个语料库中的语法频率,并创建了11个频率区间。我们通过用户研究和语料库分析来评估该指标。在用户研究中,我们从每个区间随机选择20个句子,控制句子长度和词汇频率,并在亚马逊土耳其机器人平台上为每个句子招募30名读者(N = 6600)。我们使用填空测试来测量实际难度(理解程度),使用5点李克特量表来测量感知难度,并记录阅读时间。语法结构更频繁的句子,即使表面呈现非常不同,也更容易理解,被认为更容易,阅读所需时间也更少。可读性公式的结果与感知难度相关,但与实际难度无关。我们的语料库分析展示了该指标如何用于理解广泛语料库中的语法规律性。

相似文献

1
Measuring Text Difficulty Using Parse-Tree Frequency.利用句法树频率测量文本难度
J Assoc Inf Sci Technol. 2017 Sep;68(9):2088-2100. doi: 10.1002/asi.23855. Epub 2017 Jun 20.
2
A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty.一项用户研究测量了词汇简化和连贯性增强对感知和实际文本难度的影响。
Int J Med Inform. 2013 Aug;82(8):717-30. doi: 10.1016/j.ijmedinf.2013.03.001. Epub 2013 Apr 29.
3
User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention.用户对一种使用术语熟悉度的文本简化算法在感知、理解、学习和信息保留方面效果的评估。
J Med Internet Res. 2013 Jul 31;15(7):e144. doi: 10.2196/jmir.2569.
4
Improving perceived and actual text difficulty for health information consumers using semi-automated methods.使用半自动方法提高健康信息消费者对文本难度的感知及实际文本难度。
AMIA Annu Symp Proc. 2012;2012:522-31. Epub 2012 Nov 3.
5
The effect of word familiarity on actual and perceived text difficulty.词汇熟悉度对实际文本难度和感知文本难度的影响。
J Am Med Inform Assoc. 2014 Feb;21(e1):e169-72. doi: 10.1136/amiajnl-2013-002172. Epub 2013 Oct 7.
6
Readability Formulas and User Perceptions of Electronic Health Records Difficulty: A Corpus Study.可读性公式与用户对电子健康记录难度的认知:一项语料库研究
J Med Internet Res. 2017 Mar 2;19(3):e59. doi: 10.2196/jmir.6962.
7
The influence of text characteristics on perceived and actual difficulty of health information.文本特征对健康信息感知难度和实际难度的影响。
Int J Med Inform. 2010 Jun;79(6):438-49. doi: 10.1016/j.ijmedinf.2010.02.002. Epub 2010 Mar 4.
8
The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study.表面、语义和语法特征对西班牙语医学文本简化的作用:一项用户研究。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1322-1331. eCollection 2017.
9
Moving Beyond Readability Metrics for Health-Related Text Simplification.超越可读性指标进行健康相关文本简化。
IT Prof. 2016 May-Jun;18(3):45-51. doi: 10.1109/MITP.2016.50. Epub 2016 May 25.
10
NegAIT: A new parser for medical text simplification using morphological, sentential and double negation.NegAIT:一种使用形态学、句子结构和双重否定进行医学文本简化的新型解析器。
J Biomed Inform. 2017 May;69:55-62. doi: 10.1016/j.jbi.2017.03.014. Epub 2017 Mar 22.

引用本文的文献

1
Can deepfakes manipulate us? Assessing the evidence via a critical scoping review.深度伪造能操控我们吗?通过批判性范围综述评估证据。
PLoS One. 2025 May 2;20(5):e0320124. doi: 10.1371/journal.pone.0320124. eCollection 2025.
2
The Impact of Medical Explainable Artificial Intelligence on Nurses' Innovation Behaviour: A Structural Equation Modelling Approach.医学可解释人工智能对护士创新行为的影响:一种结构方程建模方法。
J Nurs Manag. 2024 Sep 26;2024:8885760. doi: 10.1155/2024/8885760. eCollection 2024.
3
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization.APPLS:评估用于平实语言摘要的评估指标
Proc Conf Empir Methods Nat Lang Process. 2024 Nov;2024:9194-9211. doi: 10.18653/v1/2024.emnlp-main.519.
4
Text and Audio Simplification: Human vs. ChatGPT.文本与音频简化:人类与ChatGPT对比
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:295-304. eCollection 2024.
5
Assessing the Readability of Online Patient Education Materials in Obstetrics and Gynecology Using Traditional Measures: Comparative Analysis and Limitations.使用传统方法评估妇产科在线患者教育材料的可读性:比较分析与局限性。
J Med Internet Res. 2023 Aug 30;25:e46346. doi: 10.2196/46346.
6
Toward Improving Health Literacy in Patient Education Materials with Neural Machine Translation Models.利用神经机器翻译模型提高患者教育材料中的健康素养
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:418-426. eCollection 2023.
7
Readability of English, German, and Russian Disease-Related Wikipedia Pages: Automated Computational Analysis.英文、德文和俄文疾病相关维基百科页面的易读性:自动化计算分析。
J Med Internet Res. 2022 May 16;24(5):e36835. doi: 10.2196/36835.
8
The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study.表面、语义和语法特征对西班牙语医学文本简化的作用:一项用户研究。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1322-1331. eCollection 2017.
9
NegAIT: A new parser for medical text simplification using morphological, sentential and double negation.NegAIT:一种使用形态学、句子结构和双重否定进行医学文本简化的新型解析器。
J Biomed Inform. 2017 May;69:55-62. doi: 10.1016/j.jbi.2017.03.014. Epub 2017 Mar 22.

本文引用的文献

1
Deaths: Final Data for 2012.死亡:2012年最终数据。
Natl Vital Stat Rep. 2015 Aug 31;63(9):1-117.
2
Term Familiarity to indicate Perceived and Actual Difficulty of Text in Medical Digital Libraries.用术语熟悉度来表示医学数字图书馆中文本的感知难度和实际难度。
Digit Libraries Cult Herit Knowl Dissem Future Creat (2011). 2011 Oct;7008:307-310. doi: 10.1007/978-3-642-24826-9_38.
3
Clustering clinical trials with similar eligibility criteria features.对具有相似纳入标准特征的临床试验进行聚类。
J Biomed Inform. 2014 Dec;52:112-20. doi: 10.1016/j.jbi.2014.01.009. Epub 2014 Feb 1.
4
The effect of word familiarity on actual and perceived text difficulty.词汇熟悉度对实际文本难度和感知文本难度的影响。
J Am Med Inform Assoc. 2014 Feb;21(e1):e169-72. doi: 10.1136/amiajnl-2013-002172. Epub 2013 Oct 7.
5
User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention.用户对一种使用术语熟悉度的文本简化算法在感知、理解、学习和信息保留方面效果的评估。
J Med Internet Res. 2013 Jul 31;15(7):e144. doi: 10.2196/jmir.2569.
6
A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty.一项用户研究测量了词汇简化和连贯性增强对感知和实际文本难度的影响。
Int J Med Inform. 2013 Aug;82(8):717-30. doi: 10.1016/j.ijmedinf.2013.03.001. Epub 2013 Apr 29.
7
Improving perceived and actual text difficulty for health information consumers using semi-automated methods.使用半自动方法提高健康信息消费者对文本难度的感知及实际文本难度。
AMIA Annu Symp Proc. 2012;2012:522-31. Epub 2012 Nov 3.
8
Health literacy screening instruments for eHealth applications: a systematic review.电子健康应用的健康素养筛查工具:系统评价。
J Biomed Inform. 2012 Jun;45(3):598-607. doi: 10.1016/j.jbi.2012.04.001. Epub 2012 Apr 12.
9
A semantic and syntactic text simplification tool for health content.一种用于健康内容的语义和句法文本简化工具。
AMIA Annu Symp Proc. 2010 Nov 13;2010:366-70.
10
The influence of text characteristics on perceived and actual difficulty of health information.文本特征对健康信息感知难度和实际难度的影响。
Int J Med Inform. 2010 Jun;79(6):438-49. doi: 10.1016/j.ijmedinf.2010.02.002. Epub 2010 Mar 4.