Suppr超能文献

一种实用的语言复杂度研究方法:以维基百科为例

A practical approach to language complexity: a Wikipedia case study.

机构信息

Department of Theoretical Physics, Budapest University of Technology and Economics, Budapest, Hungary.

出版信息

PLoS One. 2012;7(11):e48386. doi: 10.1371/journal.pone.0048386. Epub 2012 Nov 7.

Abstract

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.

摘要

本文对维基百科英文版的文本进行了统计分析。我们试图通过比较简单英文版(Simple)和主要英文版(Main)的可比样本,从经验上解决语言复杂性的问题。Simple 应该使用更简化的语言和有限的词汇,并且编辑者被明确要求遵循这一准则,但实际上这两个样本的词汇丰富度处于同一水平。对较长单元(单词的 n 元组和词性标记)的详细分析表明,Simple 的语言比 Main 的语言简单,主要是因为它使用了更短的句子,而不是语法或词汇的大幅简化。通过古宁可读性指数比较这两种语言变体支持了这一结论。我们还报告了语言复杂性的主题依赖性,即在概念性文章中语言更高级,而在基于人物(传记)和基于对象的文章中则语言更简单。最后,我们通过分析与有争议和和平发展的文章相关的讨论页的内容,研究了冲突与语言复杂性之间的关系,得出结论认为争议会降低语言的复杂性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77a6/3492358/503c23c37477/pone.0048386.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验