通过推特众包方言特征分析

Crowdsourcing dialect characterization through Twitter.

作者信息

Gonçalves Bruno, Sánchez David

机构信息

Aix-Marseille Université, CNRS, CPT, UMR 7332, 13288 Marseille, France; Université de Toulon, CNRS, CPT, UMR 7332, 83957 La Garde, France.

Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (UIB-CSIC), E-07122 Palma de Mallorca, Spain.

出版信息

PLoS One. 2014 Nov 19;9(11):e112074. doi: 10.1371/journal.pone.0112074. eCollection 2014.

DOI:10.1371/journal.pone.0112074

PMID:25409174

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4237322/

Abstract

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.

摘要

我们使用带有地理标签的微博数据集对语言的地域差异进行了大规模分析。通过收集两年多来用西班牙语撰写的所有推特消息，我们构建了一个语料库，从该语料库中精心挑选的一系列概念使我们能够在全球范围内描述西班牙语的变体。聚类分析证明存在具有共同词汇属性的明确宏观区域。值得注意的是，我们发现西班牙语分为两种超级方言，一种是在美国和西班牙主要城市使用的城市方言，另一种是涵盖农村地区和小镇的多样化形式。后者可以进一步聚类为具有更强地域特征的较小变体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19dd/4237322/1d3b0a787b5e/pone.0112074.g001.jpg

相似文献

Crowdsourcing dialect characterization through Twitter.通过推特众包方言特征分析

PLoS One. 2014 Nov 19;9(11):e112074. doi: 10.1371/journal.pone.0112074. eCollection 2014.

Social media in radiology: early trends in Twitter microblogging at radiology's largest international meeting.放射学中的社交媒体：在放射学最大型国际会议上推特微博的早期趋势

J Am Coll Radiol. 2014 Apr;11(4):387-90. doi: 10.1016/j.jacr.2013.07.015. Epub 2013 Oct 17.

Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use.众包推特注释以识别处方药使用的第一手经验。

J Biomed Inform. 2015 Dec;58:280-287. doi: 10.1016/j.jbi.2015.11.004. Epub 2015 Nov 7.

Mapping the Americanization of English in space and time.描绘英语在时空上的美国化。

PLoS One. 2018 May 25;13(5):e0197741. doi: 10.1371/journal.pone.0197741. eCollection 2018.

Methodological considerations in analyzing Twitter data.分析推特数据时的方法学考量

J Natl Cancer Inst Monogr. 2013 Dec;2013(47):140-6. doi: 10.1093/jncimonographs/lgt026.

The Twitter of Babel: mapping world languages through microblogging platforms.巴别塔的推特：通过微博平台绘制世界语言图谱。

PLoS One. 2013 Apr 18;8(4):e61981. doi: 10.1371/journal.pone.0061981. Print 2013.

Mapping Lexical Dialect Variation in British English Using Twitter.利用推特绘制英式英语中的词汇方言变异图谱

Front Artif Intell. 2019 Jul 12;2:11. doi: 10.3389/frai.2019.00011. eCollection 2019.

Crowdsourcing Language Change with Smartphone Applications.通过智能手机应用程序众包语言变化

PLoS One. 2016 Jan 4;11(1):e0143060. doi: 10.1371/journal.pone.0143060. eCollection 2016.

Are Twitter and Blogs Important Tools for the Modern Psychological Scientist?Twitter 和博客是现代心理科学研究者的重要工具吗？

Perspect Psychol Sci. 2017 Nov;12(6):1171-1175. doi: 10.1177/1745691617712266.

Following the crowd: patterns of crowdsourcing on Twitter among urologists.随大流：Twitter 上泌尿科医生众包的模式。

World J Urol. 2019 Mar;37(3):567-572. doi: 10.1007/s00345-018-2405-5. Epub 2018 Jul 16.

引用本文的文献

When dialects collide: how socioeconomic mixing affects language use.当方言碰撞时：社会经济融合如何影响语言使用。

EPJ Data Sci. 2025;14(1):47. doi: 10.1140/epjds/s13688-025-00563-9. Epub 2025 Jul 10.

Using Twitter to collect a multi-dialectal corpus of Albanian using advanced geotagging and dialect modeling.利用 Twitter 收集使用高级地理标记和方言建模的多方言阿尔巴尼亚语语料库。

PLoS One. 2023 Nov 27;18(11):e0294284. doi: 10.1371/journal.pone.0294284. eCollection 2023.

Probing sociodemographic influence on code-switching and language choice in Quebec with geolocation of tweets.利用推文地理位置探究社会人口统计学对魁北克语码转换和语言选择的影响。

Front Psychol. 2023 May 2;14:1137038. doi: 10.3389/fpsyg.2023.1137038. eCollection 2023.

Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings.哈哈哈，老兄，没错！：可拉伸单词的双参数特征以及打字错误和拼写错误的动态。

PLoS One. 2020 May 27;15(5):e0232938. doi: 10.1371/journal.pone.0232938. eCollection 2020.

Mapping the Americanization of English in space and time.描绘英语在时空上的美国化。

PLoS One. 2018 May 25;13(5):e0197741. doi: 10.1371/journal.pone.0197741. eCollection 2018.

Immigrant community integration in world cities.世界城市的移民社区融合。

PLoS One. 2018 Mar 14;13(3):e0191612. doi: 10.1371/journal.pone.0191612. eCollection 2018.

Comparing and modelling land use organization in cities.比较和模拟城市土地利用组织。

R Soc Open Sci. 2015 Dec 2;2(12):150449. doi: 10.1098/rsos.150449. eCollection 2015 Dec.

Crowdsourcing Language Change with Smartphone Applications.通过智能手机应用程序众包语言变化

PLoS One. 2016 Jan 4;11(1):e0143060. doi: 10.1371/journal.pone.0143060. eCollection 2016.

Tracking Time Evolution of Collective Attention Clusters in Twitter: Time Evolving Nonnegative Matrix Factorisation.追踪推特上集体关注集群的时间演变：时间演化非负矩阵分解

PLoS One. 2015 Sep 29;10(9):e0139085. doi: 10.1371/journal.pone.0139085. eCollection 2015.

You Are What You Tweet: Connecting the Geographic Variation in America's Obesity Rate to Twitter Content.人如其言：将美国肥胖率的地理差异与推特内容联系起来。

PLoS One. 2015 Sep 2;10(9):e0133505. doi: 10.1371/journal.pone.0133505. eCollection 2015.

本文引用的文献

The Twitter of Babel: mapping world languages through microblogging platforms.巴别塔的推特：通过微博平台绘制世界语言图谱。

PLoS One. 2013 Apr 18;8(4):e61981. doi: 10.1371/journal.pone.0061981. Print 2013.

Digital epidemiology.数字流行病学。

PLoS Comput Biol. 2012;8(7):e1002616. doi: 10.1371/journal.pcbi.1002616. Epub 2012 Jul 26.

Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control.利用在线社交媒体评估疫苗接种情绪：对传染病动态和控制的影响。

PLoS Comput Biol. 2011 Oct;7(10):e1002199. doi: 10.1371/journal.pcbi.1002199. Epub 2011 Oct 13.

Structural and dynamical patterns on online social networks: the Spanish May 15th movement as a case study.在线社交网络中的结构和动态模式：以西班牙 5 月 15 日运动为例。

PLoS One. 2011;6(8):e23883. doi: 10.1371/journal.pone.0023883. Epub 2011 Aug 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过推特众包方言特征分析

Crowdsourcing dialect characterization through Twitter.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献