• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于测量形态复杂性的生产率和可预测性。

Productivity and Predictability for Measuring Morphological Complexity.

作者信息

Gutierrez-Vasques Ximena, Mijangos Victor

机构信息

Language and Space Lab, URPP Language and Space, University of Zurich, 8006 Zurich, Switzerland.

Institute of Philological Research, National Autonomous University of Mexico, 04510 Mexico City, Mexico.

出版信息

Entropy (Basel). 2019 Dec 30;22(1):48. doi: 10.3390/e22010048.

DOI:10.3390/e22010048
PMID:33285823
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516478/
Abstract

We propose a quantitative approach for quantifying morphological complexity of a language based on text. Several corpus-based methods have focused on measuring the different word forms that a language can produce. We take into account not only the productivity of morphological processes but also the predictability of those morphological processes. We use a language model that predicts the probability of sub-word sequences within a word; we calculate the entropy rate of this model and use it as a measure of predictability of the internal structure of words. Our results show that it is important to integrate these two dimensions when measuring morphological complexity, since languages can be complex under one measure but simpler under another one. We calculated the complexity measures in two different parallel corpora for a typologically diverse set of languages. Our approach is corpus-based and it does not require the use of linguistic annotated data.

摘要

我们提出了一种基于文本对语言形态复杂性进行量化的定量方法。几种基于语料库的方法专注于测量一种语言能够产生的不同词形。我们不仅考虑形态变化过程的生成能力,还考虑这些形态变化过程的可预测性。我们使用一种预测单词内子词序列概率的语言模型;我们计算该模型的熵率,并将其用作单词内部结构可预测性的度量。我们的结果表明,在测量形态复杂性时整合这两个维度很重要,因为语言在一种度量下可能很复杂,但在另一种度量下可能更简单。我们针对一组类型多样的语言,在两个不同的平行语料库中计算了复杂性度量。我们的方法是基于语料库的,并且不需要使用语言标注数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/752e/7516478/ecf8d3a5dbd3/entropy-22-00048-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/752e/7516478/13d47ccdb074/entropy-22-00048-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/752e/7516478/c89857e2d2c8/entropy-22-00048-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/752e/7516478/ecf8d3a5dbd3/entropy-22-00048-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/752e/7516478/13d47ccdb074/entropy-22-00048-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/752e/7516478/c89857e2d2c8/entropy-22-00048-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/752e/7516478/ecf8d3a5dbd3/entropy-22-00048-g003.jpg

相似文献

1
Productivity and Predictability for Measuring Morphological Complexity.用于测量形态复杂性的生产率和可预测性。
Entropy (Basel). 2019 Dec 30;22(1):48. doi: 10.3390/e22010048.
2
Frequency, Informativity and Word Length: Insights from Typologically Diverse Corpora.频率、信息量与词长:来自类型学多样语料库的见解
Entropy (Basel). 2022 Feb 16;24(2):280. doi: 10.3390/e24020280.
3
Quantifying the information in the long-range order of words: semantic structures and universal linguistic constraints.量化词的长程顺序中的信息:语义结构与普遍语言限制
Cortex. 2014 Jun;55:5-16. doi: 10.1016/j.cortex.2013.08.008. Epub 2013 Aug 29.
4
Syllable Complexity and Morphological Synthesis: A Well-Motivated Positive Complexity Correlation Across Subdomains.音节复杂性与形态合成:跨子领域存在充分动机的正向复杂性关联。
Front Psychol. 2021 Mar 17;12:638659. doi: 10.3389/fpsyg.2021.638659. eCollection 2021.
5
A large quantitative analysis of written language challenges the idea that all languages are equally complex.一项针对书面语言的大规模定量分析对所有语言都同样复杂这一观点提出了挑战。
Sci Rep. 2023 Sep 16;13(1):15351. doi: 10.1038/s41598-023-42327-3.
6
Non-Arbitrariness in Mapping Word Form to Meaning: Cross-Linguistic Formal Markers of Word Concreteness.词形与意义映射中的非任意性:词具体性的跨语言形式标记
Cogn Sci. 2017 May;41(4):1071-1089. doi: 10.1111/cogs.12361. Epub 2016 Mar 14.
7
Contextual predictability influences word and morpheme duration in a morphologically complex language (Kaqchikel Mayan).语境可预测性影响形态复杂语言(卡克奇克尔玛雅语)中单词和语素的时长。
J Acoust Soc Am. 2018 Aug;144(2):997. doi: 10.1121/1.5046095.
8
The statistical signature of morphosyntax: a study of Hungarian and Italian infant-directed speech.形态句法的统计特征:对匈牙利语和意大利语婴儿导向语的研究。
Cognition. 2012 Nov;125(2):263-87. doi: 10.1016/j.cognition.2012.06.010. Epub 2012 Aug 6.
9
Constructing Complexity in a Young Sign Language.在一种新兴手语中构建复杂性
Front Psychol. 2018 Dec 13;9:2202. doi: 10.3389/fpsyg.2018.02202. eCollection 2018.
10
Measuring orthographic transparency and morphological-syllabic complexity in alphabetic orthographies: a narrative review.测量字母文字正字法中的正字透明度和形态音节复杂性:一项叙述性综述。
Read Writ. 2017;30(8):1617-1638. doi: 10.1007/s11145-017-9741-5. Epub 2017 Apr 17.

引用本文的文献

1
Information Theory and Language.信息论与语言
Entropy (Basel). 2020 Apr 11;22(4):435. doi: 10.3390/e22040435.

本文引用的文献

1
Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche.不同语言,相似的编码效率:人类交际范围内相当的信息率。
Sci Adv. 2019 Sep 4;5(9):eaaw2594. doi: 10.1126/sciadv.aaw2594. eCollection 2019 Sep.
2
The proof and measurement of association between two things. By C. Spearman, 1904.两件事物之间关联的证明与度量。作者C. 斯皮尔曼,1904年。
Am J Psychol. 1987 Fall-Winter;100(3-4):441-71.