Imada Mizuho
Institute of Humanities and Social Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan.
F1000Res. 2023 Apr 11;12:379. doi: 10.12688/f1000research.132383.1. eCollection 2023.
This study analyzed the distribution of the sentence length and mean of dependency distances (MDD) in Japanese sentences, comparing data from random sources with that obtained from children's compositions, and identifying changes in distribution according to grade level. Findings reveal that the sentence length in random data is well suited to a geometric distribution, whereas MDD is well suited to a lognormal distribution. In contrast, data from children's compositions show a shift in the distribution of the number of clauses from a lognormal to a gamma distribution, depending on the school year, with MDD suiting a gamma distribution. Mean MDD increases exponentially with the logarithm of the number of clauses in random data, while it increases linearly in composition data, thus generally supporting previous findings that dependency distances are optimized in natural language. However, MDDs exhibit non-monotonic changes with grades, suggesting the complexity of children's language development.
本研究分析了日语句子的长度分布和依存距离均值(MDD),将随机来源的数据与儿童作文数据进行比较,并确定了根据年级水平的分布变化。研究结果表明,随机数据中的句子长度非常适合几何分布,而MDD则非常适合对数正态分布。相比之下,儿童作文数据显示,从句数量的分布根据学年从对数正态分布转变为伽马分布,MDD适合伽马分布。随机数据中MDD的均值随着从句数量的对数呈指数增长,而在作文数据中则呈线性增长,从而总体上支持了先前关于依存距离在自然语言中得到优化的研究结果。然而,MDD随年级呈现非单调变化,这表明儿童语言发展的复杂性。