Department of Mathematics and Statistics, University of Vermont, Burlington, VT, United States of America.
Vermont Complex Systems Center, University of Vermont, Burlington, VT, United States of America.
PLoS One. 2020 May 27;15(5):e0232938. doi: 10.1371/journal.pone.0232938. eCollection 2020.
Stretched words like 'heellllp' or 'heyyyyy' are a regular feature of spoken language, often used to emphasize or exaggerate the underlying meaning of the root word. While stretched words are rarely found in formal written language and dictionaries, they are prevalent within social media. In this paper, we examine the frequency distributions of 'stretchable words' found in roughly 100 billion tweets authored over an 8 year period. We introduce two central parameters, 'balance' and 'stretch', that capture their main characteristics, and explore their dynamics by creating visual tools we call 'balance plots' and 'spelling trees'. We discuss how the tools and methods we develop here could be used to study the statistical patterns of mistypings and misspellings and be used as a basis for other linguistic research involving stretchable words, along with the potential applications in augmenting dictionaries, improving language processing, and in any area where sequence construction matters, such as genetics.
拉长的单词,如“heellllp”或“heyyyyy”,是口语中的常见特征,常用于强调或夸大词根的含义。虽然拉长的单词在正式书面语言和词典中很少见,但它们在社交媒体中很常见。在本文中,我们研究了在大约 1000 亿条推文作者在 8 年时间内的频率分布,这些推文都使用了拉长的单词。我们引入了两个核心参数,“平衡”和“拉伸”,来捕捉它们的主要特征,并通过创建我们称之为“平衡图”和“拼写树”的可视化工具来探索它们的动态。我们讨论了如何使用我们在这里开发的工具和方法来研究错别字和拼写错误的统计模式,并将其作为涉及可拉伸单词的其他语言研究的基础,以及在字典增强、语言处理和任何序列构建重要的领域中的潜在应用,例如遗传学。