Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA.
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, USA.
AIDS Behav. 2018 Jul;22(7):2322-2333. doi: 10.1007/s10461-018-2046-0.
The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.
本研究评估了利用 Twitter 数据提供性传播感染风险指数的可能性。我们开发了基于推文的在线风险指数(ORI),以预测美国各县和 5 年内新的 HIV、淋病和衣原体感染诊断。我们使用开放式词汇技术分析了 2009 年至 2013 年间的超过 1 亿条推文,并通过将当年的推文输入到多个语义模型中(每年一个)来估计当年的 ORI。ORI 与实际比率中度到高度相关(93%的模型中 rs 在 0.35 到 0.68 之间),无论是在全国范围内还是在单个州(加利福尼亚州、佛罗里达州和纽约州)都是如此。在预测淋病和衣原体感染方面,较新的模型略优于旧模型,但在预测 HIV 方面并非如此。该技术使用免费的社交媒体数据以高时空分辨率提供社区健康信号。