Llorente Alejandro, Garcia-Herranz Manuel, Cebrian Manuel, Moro Esteban
Instituto de Ingeniería del Conocimiento, Universidad Autónoma de Madrid, Madrid 28049, Spain; Departamento de Matemáticas & GISC, Universidad Carlos III de Madrid, Leganés 28911, Spain.
UNICEF Innovation Unit, New York, NY 10017, USA.
PLoS One. 2015 May 28;10(5):e0128692. doi: 10.1371/journal.pone.0128692. eCollection 2015.
Recent widespread adoption of electronic and pervasive technologies has enabled the study of human behavior at an unprecedented level, uncovering universal patterns underlying human activity, mobility, and interpersonal communication. In the present work, we investigate whether deviations from these universal patterns may reveal information about the socio-economical status of geographical regions. We quantify the extent to which deviations in diurnal rhythm, mobility patterns, and communication styles across regions relate to their unemployment incidence. For this we examine a country-scale publicly articulated social media dataset, where we quantify individual behavioral features from over 19 million geo-located messages distributed among more than 340 different Spanish economic regions, inferred by computing communities of cohesive mobility fluxes. We find that regions exhibiting more diverse mobility fluxes, earlier diurnal rhythms, and more correct grammatical styles display lower unemployment rates. As a result, we provide a simple model able to produce accurate, easily interpretable reconstruction of regional unemployment incidence from their social-media digital fingerprints alone. Our results show that cost-effective economical indicators can be built based on publicly-available social media datasets.
近期电子技术和普及技术的广泛应用,使得对人类行为的研究达到了前所未有的水平,揭示了人类活动、移动性和人际交流背后的普遍模式。在本研究中,我们调查偏离这些普遍模式是否能揭示有关地理区域社会经济地位的信息。我们量化了各地区昼夜节律、移动模式和交流方式的偏差与其失业率之间的关联程度。为此,我们研究了一个国家规模的公开社交媒体数据集,通过计算具有凝聚力的移动通量社区,从分布在340多个不同西班牙经济区域的1900多万条地理位置信息中量化个体行为特征。我们发现,移动通量更多样化、昼夜节律更早且语法风格更正确的地区,失业率较低。因此,我们提供了一个简单模型,仅根据社交媒体数字指纹就能准确、易于解释地重建区域失业率。我们的结果表明,可以基于公开可用的社交媒体数据集构建具有成本效益的经济指标。