Bleaman Isaac L
Department of Linguistics, University of California, Berkeley, Berkeley, CA, United States.
Front Artif Intell. 2020 May 29;3:35. doi: 10.3389/frai.2020.00035. eCollection 2020.
The recent turn to "big data" from social media corpora has enabled sociolinguists to investigate patterns of language variation and change at unprecedented scales. However, research in this paradigm has been slow to address variable phenomena in minority languages, where data scarcity and the absence of computational tools (e.g., taggers, parsers) often present significant barriers to entry. This article analyzes socio-syntactic variation in one minority language variety, Hasidic Yiddish, focusing on a variable for which tokens can be identified in raw text using purely morphological criteria. In non-finite particle verbs, the overt tense marker (cf. English , German ) is variably realized either between the preverbal particle and verb (e.g., up-to-eat-INF 'to eat up'; the conservative variant) or before both elements ( to up-eat-INF; the innovative variant). Nearly 38,000 tokens of non-finite particle verbs were extracted from the popular Hasidic Yiddish discussion forum (the 'coffee room'; kaveshtiebel.com). A mixed-effects regression analysis reveals that despite a forum-wide favoring effect for the innovative variant, users favor the conservative variant the longer their accounts remain open and active. This process of rapid implicit standardization is supported by ethnographic evidence highlighting the spread of language norms among Hasidic writers on the internet, most of whom did not have the opportunity to express themselves in written Yiddish prior to the advent of social media.
最近从社交媒体语料库转向“大数据”,使社会语言学家能够以前所未有的规模研究语言变异和变化的模式。然而,这种范式的研究在处理少数民族语言中的可变现象方面进展缓慢,因为数据稀缺和缺乏计算工具(如标记器、解析器)往往构成了重大的进入障碍。本文分析了一种少数民族语言变体——哈西德派意第绪语中的社会句法变异,重点关注一个可以使用纯形态学标准在原始文本中识别词元的变量。在非限定性小品词动词中,显性时态标记(参见英语的“to”、德语的“zu”)在小品词和动词之间(例如,“up-to-eat-INF”,意为“吃光”;保守变体)或在两者之前(“to up-eat-INF”;创新变体)实现方式可变。从流行的哈西德派意第绪语讨论论坛(“咖啡屋”;kaveshtiebel.com)中提取了近38000个非限定性小品词动词的词元。混合效应回归分析表明,尽管论坛范围内对创新变体有偏好效应,但用户账户保持开放和活跃的时间越长,他们越喜欢保守变体。这种快速的隐性标准化过程得到了人种学证据的支持,该证据突出了语言规范在互联网上哈西德派作家中的传播,他们中的大多数人在社交媒体出现之前没有机会用书面意第绪语表达自己。