Department of Sociology, San Diego State University, San Diego, California, United States of America.
Department of Linguistics and Asian/Middle Eastern Languages, San Diego State University, San Diego, California, United States of America.
PLoS One. 2019 Jul 11;14(7):e0219550. doi: 10.1371/journal.pone.0219550. eCollection 2019.
Several studies have recently applied sentiment-based lexicons to Twitter to gauge local sentiment to understand health behaviors and outcomes for local areas. While this research has demonstrated the vast potential of this approach, lingering questions remain regarding the validity of Twitter mining and surveillance in local health research. First, how well does this approach predict health outcomes at very local scales, such as neighborhoods? Second, how robust are the findings garnered from sentiment signals when accounting for spatial effects? To evaluate these questions, we link 2,076,025 tweets from 66,219 distinct users in the city of San Diego over the period of 2014-12-06 to 2017-05-24 to the 500 Cities Project data and 2010-2014 American Community Survey data. We determine how well sentiment predicts self-rated mental health, sleep quality, and heart disease at a census tract level, controlling for neighborhood characteristics and spatial autocorrelation. We find that sentiment is related to some outcomes on its own, but these relationships are not present when controlling for other neighborhood factors. Evaluating our encoding strategy more closely, we discuss the limitations of existing measures of neighborhood sentiment, calling for more attention to how race/ethnicity and socio-economic status play into inferences drawn from such measures.
最近有几项研究利用基于情感的词汇表来分析 Twitter 上的内容,以衡量当地的情绪,从而了解当地的健康行为和结果。虽然这项研究展示了这种方法的巨大潜力,但关于在当地健康研究中使用 Twitter 挖掘和监测的有效性仍然存在一些疑问。首先,这种方法在非常局部的尺度(如邻里)上预测健康结果的效果如何?其次,在考虑空间效应时,情感信号所获得的发现的稳健性如何?为了评估这些问题,我们将 2014 年 12 月 6 日至 2017 年 5 月 24 日期间,来自圣地亚哥市的 66219 位用户的 2076025 条推文与 500 个城市项目数据和 2010-2014 年美国社区调查数据相关联。我们确定情感在多大程度上可以预测普查区层面的自评心理健康、睡眠质量和心脏病,同时控制邻里特征和空间自相关。我们发现情感本身与某些结果有关,但在控制其他邻里因素时,这些关系并不存在。更仔细地评估我们的编码策略,我们讨论了现有邻里情感衡量标准的局限性,呼吁更多地关注种族/民族和社会经济地位如何影响从这些措施中得出的推论。