Suppr超能文献

利用 Twitter 衡量公众对疾病的讨论:一项案例研究。

Using Twitter to Measure Public Discussion of Diseases: A Case Study.

机构信息

Positive Psychology Center, Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States.

出版信息

JMIR Public Health Surveill. 2015 Jun 26;1(1):e6. doi: 10.2196/publichealth.3953.

Abstract

BACKGROUND

Twitter is increasingly used to estimate disease prevalence, but such measurements can be biased, due to both biased sampling and inherent ambiguity of natural language.

OBJECTIVE

We characterized the extent of these biases and how they vary with disease.

METHODS

We correlated self-reported prevalence rates for 22 diseases from Experian's Simmons National Consumer Study (n=12,305) with the number of times these diseases were mentioned on Twitter during the same period (2012). We also identified and corrected for two types of bias present in Twitter data: (1) demographic variance between US Twitter users and the general US population; and (2) natural language ambiguity, which creates the possibility that mention of a disease name may not actually refer to the disease (eg, "heart attack" on Twitter often does not refer to myocardial infarction). We measured the correlation between disease prevalence and Twitter disease mentions both with and without bias correction. This allowed us to quantify each disease's overrepresentation or underrepresentation on Twitter, relative to its prevalence.

RESULTS

Our sample included 80,680,449 tweets. Adjusting disease prevalence to correct for Twitter demographics more than doubles the correlation between Twitter disease mentions and disease prevalence in the general population (from .113 to .258, P <.001). In addition, diseases varied widely in how often mentions of their names on Twitter actually referred to the diseases, from 14.89% (3827/25,704) of instances (for stroke) to 99.92% (5044/5048) of instances (for arthritis). Applying ambiguity correction to our Twitter corpus achieves a correlation between disease mentions and prevalence of .208 ( P <.001). Simultaneously applying correction for both demographics and ambiguity more than triples the baseline correlation to .366 ( P <.001). Compared with prevalence rates, cancer appeared most overrepresented in Twitter, whereas high cholesterol appeared most underrepresented.

CONCLUSIONS

Twitter is a potentially useful tool to measure public interest in and concerns about different diseases, but when comparing diseases, improvements can be made by adjusting for population demographics and word ambiguity.

摘要

背景

Twitter 正逐渐被用于估计疾病的流行率,但由于采样偏差和自然语言的固有模糊性,此类测量可能存在偏差。

目的

我们描述了这些偏差的程度及其随疾病的变化情况。

方法

我们将 Experian 的 Simmons 全国消费者研究(n=12305)中报告的 22 种疾病的自报流行率与同期在 Twitter 上提及这些疾病的次数(2012 年)进行了相关性分析。我们还确定并纠正了 Twitter 数据中存在的两种类型的偏差:(1)美国 Twitter 用户与美国一般人群之间的人口统计学差异;(2)自然语言模糊性,这使得提及疾病名称的可能性不一定指的是该疾病(例如,Twitter 上的“心脏病发作”通常并不指心肌梗死)。我们在进行和不进行偏差校正的情况下,分别测量了疾病流行率与 Twitter 疾病提及率之间的相关性。这使我们能够量化每种疾病在 Twitter 上相对于其流行率的过度或不足。

结果

我们的样本包括 80680449 条推文。通过调整疾病流行率来校正 Twitter 人口统计学数据,Twitter 疾病提及与一般人群中疾病流行率之间的相关性增加了一倍以上(从.113 增加到.258,P<0.001)。此外,Twitter 上疾病名称的提及与实际疾病之间的关联程度差异很大,从 14.89%(25704 次中的 3827 次)到 99.92%(5048 次中的 5044 次)。在我们的 Twitter 语料库中应用歧义校正后,疾病提及与流行率之间的相关性达到.208(P<0.001)。同时应用人口统计学和歧义校正可将基线相关性提高三倍以上,达到.366(P<0.001)。与流行率相比,癌症在 Twitter 上的出现频率似乎过高,而高胆固醇的出现频率似乎过低。

结论

Twitter 是一种衡量公众对不同疾病的兴趣和关注的潜在有用工具,但在比较疾病时,通过调整人口统计学数据和词汇歧义,可以提高其准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb01/4869247/60e764759908/publichealth_v1i2e6_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验