Trye David, Calude Andreea S, Bravo-Marquez Felipe, Keegan Te Taka
Department of Computer Science, University of Waikato, Hamilton, New Zealand.
School of General and Applied Linguistics, University of Waikato, Hamilton, New Zealand.
Front Artif Intell. 2020 Apr 9;3:15. doi: 10.3389/frai.2020.00015. eCollection 2020.
Twitter constitutes a rich resource for investigating language contact phenomena. In this paper, we report findings from the analysis of a large-scale diachronic corpus of over one million tweets, containing loanwords from te reo Māori, the indigenous language spoken in New Zealand, into (primarily, New Zealand) English. Our analysis focuses on hashtags comprising mixed-language resources (which we term ), bringing together descriptive linguistic tools (investigating length, word class, and semantic domains of the hashtags) and quantitative methods (Random Forests and regression analysis). Our work has implications for language change and the study of loanwords (we argue that hybrid hashtags can be linked to loanword entrenchment), and for the study of language on social media (we challenge proposals of hashtags as "words," and show that hashtags have a dual discourse role: a micro-function within the immediate linguistic context in which they occur and a macro-function within the tweet as a whole).
推特是研究语言接触现象的丰富资源。在本文中,我们报告了对一个规模庞大的历时性语料库的分析结果,该语料库包含超过一百万条推文,其中有从新西兰的本土语言毛利语借入(主要是进入新西兰英语)的外来词。我们的分析聚焦于由混合语言资源构成的主题标签(我们将其称为 ),结合了描述性语言工具(研究主题标签的长度、词类和语义域)和定量方法(随机森林和回归分析)。我们的研究对语言变化和外来词研究具有启示意义(我们认为混合主题标签可能与外来词的稳固性有关),也对社交媒体语言研究具有启示意义(我们对将主题标签视为“单词”的观点提出质疑,并表明主题标签具有双重话语作用:在其出现的直接语言语境中具有微观功能,在整个推文中具有宏观功能)。