Tamminga Meredith
Department of Linguistics, University of Pennsylvania, Philadelphia, PA, United States.
Front Artif Intell. 2019 Jun 20;2:10. doi: 10.3389/frai.2019.00010. eCollection 2019.
Persistence is the tendency of speakers to repeat the choice of sociolinguistic variant they have recently made in conversational speech. A longstanding debate is whether this tendency toward repetitiveness reflects the direct influence of one outcome on the next instance of the variable, which I call sequential dependence, or the shared influence of shifting contextual factors on proximal instances of the variable, which I call baseline deflection. I propose that these distinct types of clustering make different predictions for sequences of variable observations that are longer than the typical prime-target pairs of typical corpus persistence studies. In corpus ING data from conversational speech, I show that there are two effects to be accounted for: an effect of how many times the /ing/ variant occurs in the 2, 3, or 4-token sequence prior to the target (regardless of order), and an effect of whether the immediately prior (1-back) token was /ing/. I then build a series of simulations involving Bernoulli trials at sequences of different probabilities that incorporate either a sequential dependence mechanism, a baseline deflection mechanism, or both. I argue that the model incorporating both baseline deflection and sequential dependence is best able to produce simulated data that shares the relevant properties of the corpus data, which is an encouraging outcome because we have independent reasons to expect both baseline deflection and sequential dependence to exist. I conclude that this exploratory analysis of longer sociolinguistic sequences reflects a promising direction for future research on the mechanisms involved in the production of sociolinguistic variation.
持续性是指说话者在会话中倾向于重复他们最近做出的社会语言学变体选择。一个长期存在的争论是,这种重复性倾向是反映了一个结果对该变量下一个实例的直接影响(我称之为序列依赖性),还是反映了变化的语境因素对该变量相邻实例的共同影响(我称之为基线偏移)。我认为,对于比典型语料库持续性研究中典型的启动-目标对更长的变量观察序列,这些不同类型的聚类会做出不同的预测。在会话语音的语料库ING数据中,我表明有两种效应需要考虑:目标之前的2、3或4词序列中/ing/变体出现的次数(无论顺序如何)的效应,以及紧邻的前一个(向后1步)词是否为/ing/的效应。然后,我构建了一系列模拟,涉及不同概率序列的伯努利试验,这些序列纳入了序列依赖性机制、基线偏移机制或两者。我认为,同时纳入基线偏移和序列依赖性的模型最能够生成与语料库数据具有相关属性的模拟数据,这是一个令人鼓舞的结果,因为我们有独立的理由预期基线偏移和序列依赖性都存在。我得出结论,对更长的社会语言序列的这种探索性分析反映了未来关于社会语言变异产生机制研究的一个有前景的方向。