Computational Story Lab, Department of Mathematics and Statistics, Vermont Complex Systems Center, The University of Vermont, Burlington, Vermont, USA.
PLoS One. 2013 May 29;8(5):e64417. doi: 10.1371/journal.pone.0064417. Print 2013.
We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated in 2011 on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-scale measures such as obesity rates.
我们对在美国各地实时表达的个人与广泛的情感、地理、人口和健康特征之间的相关性进行了详细调查。我们通过结合以下两个方面来实现这一点:(1)一个由超过 8000 万个单词组成的大规模地理标记数据集,这些单词是在 2011 年在社交媒体服务 Twitter 上生成的;(2)对所有 50 个州和近 400 个人口密集城市的年度调查特征。在众多结果中,我们根据各州和城市在词汇使用上的相似性生成了分类法;估计各州和城市的幸福水平;将高度解析的人口特征与幸福水平相关联;并将选词和信息长度与城市特征(如教育水平和肥胖率)联系起来。我们的研究结果表明,社交媒体可能如何用于估计肥胖率等人口规模措施的实时水平和变化。