Dodds Peter Sheridan, Clark Eric M, Desu Suma, Frank Morgan R, Reagan Andrew J, Williams Jake Ryland, Mitchell Lewis, Harris Kameron Decker, Kloumann Isabel M, Bagrow James P, Megerdoomian Karine, McMahon Matthew T, Tivnan Brian F, Danforth Christopher M
Computational Story Lab, Vermont Advanced Computing Core, and the Department of Mathematics and Statistics, University of Vermont, Burlington, VT 05401; Vermont Complex Systems Center, University of Vermont, Burlington, VT 05401;
Center for Computational Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139;
Proc Natl Acad Sci U S A. 2015 Feb 24;112(8):2389-94. doi: 10.1073/pnas.1411678112. Epub 2015 Feb 9.
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
通过对来自10种起源和文化各异的语言的24个语料库中的100,000个单词进行人工评估,我们提供了人类社会性在语言中深刻印记的证据,观察到:(i)自然人类语言的单词具有普遍的积极偏向;(ii)翻译中不同语言之间单词的估计情感内容是一致的;(iii)这种积极偏向与单词使用频率强烈无关。除了这些一般规律,我们还描述了语言情感频谱中的跨语言差异,这些差异使我们能够对语料库进行排名。我们还展示了如何使用我们的单词评估来构建类似物理仪器的工具,用于实时和离线测量大规模文本的情感内容。