利用 Twitter 衡量公众对疾病的讨论：一项案例研究。

Using Twitter to Measure Public Discussion of Diseases: A Case Study.

机构信息

Positive Psychology Center, Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States.

出版信息

JMIR Public Health Surveill. 2015 Jun 26;1(1):e6. doi: 10.2196/publichealth.3953.

DOI:10.2196/publichealth.3953

PMID:26925459

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4763717/

Abstract

BACKGROUND

Twitter is increasingly used to estimate disease prevalence, but such measurements can be biased, due to both biased sampling and inherent ambiguity of natural language.

OBJECTIVE

We characterized the extent of these biases and how they vary with disease.

METHODS

We correlated self-reported prevalence rates for 22 diseases from Experian's Simmons National Consumer Study (n=12,305) with the number of times these diseases were mentioned on Twitter during the same period (2012). We also identified and corrected for two types of bias present in Twitter data: (1) demographic variance between US Twitter users and the general US population; and (2) natural language ambiguity, which creates the possibility that mention of a disease name may not actually refer to the disease (eg, "heart attack" on Twitter often does not refer to myocardial infarction). We measured the correlation between disease prevalence and Twitter disease mentions both with and without bias correction. This allowed us to quantify each disease's overrepresentation or underrepresentation on Twitter, relative to its prevalence.

RESULTS

Our sample included 80,680,449 tweets. Adjusting disease prevalence to correct for Twitter demographics more than doubles the correlation between Twitter disease mentions and disease prevalence in the general population (from .113 to .258, P <.001). In addition, diseases varied widely in how often mentions of their names on Twitter actually referred to the diseases, from 14.89% (3827/25,704) of instances (for stroke) to 99.92% (5044/5048) of instances (for arthritis). Applying ambiguity correction to our Twitter corpus achieves a correlation between disease mentions and prevalence of .208 ( P <.001). Simultaneously applying correction for both demographics and ambiguity more than triples the baseline correlation to .366 ( P <.001). Compared with prevalence rates, cancer appeared most overrepresented in Twitter, whereas high cholesterol appeared most underrepresented.

CONCLUSIONS

Twitter is a potentially useful tool to measure public interest in and concerns about different diseases, but when comparing diseases, improvements can be made by adjusting for population demographics and word ambiguity.

摘要

背景

Twitter 正逐渐被用于估计疾病的流行率，但由于采样偏差和自然语言的固有模糊性，此类测量可能存在偏差。

目的

我们描述了这些偏差的程度及其随疾病的变化情况。

方法

我们将 Experian 的 Simmons 全国消费者研究（n=12305）中报告的 22 种疾病的自报流行率与同期在 Twitter 上提及这些疾病的次数（2012 年）进行了相关性分析。我们还确定并纠正了 Twitter 数据中存在的两种类型的偏差：（1）美国 Twitter 用户与美国一般人群之间的人口统计学差异；（2）自然语言模糊性，这使得提及疾病名称的可能性不一定指的是该疾病（例如，Twitter 上的“心脏病发作”通常并不指心肌梗死）。我们在进行和不进行偏差校正的情况下，分别测量了疾病流行率与 Twitter 疾病提及率之间的相关性。这使我们能够量化每种疾病在 Twitter 上相对于其流行率的过度或不足。

结果

我们的样本包括 80680449 条推文。通过调整疾病流行率来校正 Twitter 人口统计学数据，Twitter 疾病提及与一般人群中疾病流行率之间的相关性增加了一倍以上（从.113 增加到.258，P<0.001）。此外，Twitter 上疾病名称的提及与实际疾病之间的关联程度差异很大，从 14.89%（25704 次中的 3827 次）到 99.92%（5048 次中的 5044 次）。在我们的 Twitter 语料库中应用歧义校正后，疾病提及与流行率之间的相关性达到.208（P<0.001）。同时应用人口统计学和歧义校正可将基线相关性提高三倍以上，达到.366（P<0.001）。与流行率相比，癌症在 Twitter 上的出现频率似乎过高，而高胆固醇的出现频率似乎过低。

结论

Twitter 是一种衡量公众对不同疾病的兴趣和关注的潜在有用工具，但在比较疾病时，通过调整人口统计学数据和词汇歧义，可以提高其准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb01/4869247/60e764759908/publichealth_v1i2e6_fig1.jpg

相似文献

Using Twitter to Measure Public Discussion of Diseases: A Case Study.利用 Twitter 衡量公众对疾病的讨论：一项案例研究。

JMIR Public Health Surveill. 2015 Jun 26;1(1):e6. doi: 10.2196/publichealth.3953.

Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis.宾夕法尼亚州常见健康状况的推文数量及内容特征：回顾性分析

JMIR Public Health Surveill. 2018 Dec 6;4(4):e10834. doi: 10.2196/10834.

Exploring brand-name drug mentions on Twitter for pharmacovigilance.在推特上探索品牌药提及情况以进行药物警戒。

Stud Health Technol Inform. 2015;210:55-9.

Twitter Conversations About Pancreatic Cancer by Health Care Providers and the General Public: Thematic Analysis.医疗保健提供者和公众关于胰腺癌的推特对话：主题分析

JMIR Cancer. 2022 Mar 24;8(1):e31388. doi: 10.2196/31388.

Social media and flu: Media Twitter accounts as agenda setters.社交媒体与流感：作为议程设置者的媒体推特账号

Int J Med Inform. 2016 Jul;91:67-73. doi: 10.1016/j.ijmedinf.2016.04.009. Epub 2016 Apr 22.

Who tweets? Deriving the demographic characteristics of age, occupation and social class from twitter user meta-data.谁会发推文？从推特用户元数据中推导年龄、职业和社会阶层的人口统计学特征。

PLoS One. 2015 Mar 2;10(3):e0115545. doi: 10.1371/journal.pone.0115545. eCollection 2015.

Disease mentions in airport and hospital geolocations expose dominance of news events for disease concerns.机场和医院地理位置中提及的疾病表明，新闻事件在疾病相关问题上占据主导地位。

J Biomed Semantics. 2018 Jun 12;9(1):18. doi: 10.1186/s13326-018-0186-9.

Do Global Cities Enable Global Views? Using Twitter to Quantify the Level of Geographical Awareness of U.S. Cities.全球城市能带来全球视野吗？利用推特量化美国城市的地理认知水平。

PLoS One. 2015 Jul 13;10(7):e0132464. doi: 10.1371/journal.pone.0132464. eCollection 2015.

Diabetes topics associated with engagement on Twitter.与推特上的参与度相关的糖尿病话题。

Prev Chronic Dis. 2015 May 7;12:E62. doi: 10.5888/pcd12.140402.

Twitter mining for fine-grained syndromic surveillance.用于细粒度症状监测的推特挖掘

Artif Intell Med. 2014 Jul;61(3):153-63. doi: 10.1016/j.artmed.2014.01.002. Epub 2014 Jan 31.

引用本文的文献

Public Discourse Toward Older Drivers in Japan Using Social Media Data From 2010 to 2022: Longitudinal Analysis.利用2010年至2022年社交媒体数据对日本老年驾驶员的公众话语进行纵向分析

JMIR Infodemiology. 2025 Jun 16;5:e69321. doi: 10.2196/69321.

Digital Epidemiology of Prescription Drug References on X (Formerly Twitter): Neural Network Topic Modeling and Sentiment Analysis.X（前身为 Twitter）上处方药引用的数字流行病学：神经网络主题建模和情感分析。

J Med Internet Res. 2024 Aug 23;26:e57885. doi: 10.2196/57885.

AJPM Focus. 2023 Jan 18;2(2):100067. doi: 10.1016/j.focus.2023.100067. eCollection 2023 Jun.

Impact of #PsychTwitter in promoting global psychiatry: A hashtag analysis study.#PsychTwitter 对促进全球精神病学的影响：一项标签分析研究。

Front Public Health. 2023 Feb 22;11:1065368. doi: 10.3389/fpubh.2023.1065368. eCollection 2023.

Correcting Sociodemographic Selection Biases for Population Prediction from Social Media.校正社交媒体人口预测中的社会人口学选择偏差

Proc Int AAAI Conf Weblogs Soc Media. 2022 May 31;16(1):228-240.

Methods to Establish Race or Ethnicity of Twitter Users: Scoping Review.方法建立种族或族裔的 Twitter 用户：范围审查。

J Med Internet Res. 2022 Apr 29;24(4):e35788. doi: 10.2196/35788.

Data and Model Biases in Social Media Analyses: A Case Study of COVID-19 Tweets.社交媒体分析中的数据和模型偏差：以 COVID-19 推文为例。

AMIA Annu Symp Proc. 2022 Feb 21;2021:1264-1273. eCollection 2021.

Analysis of Tweets Containing Information Related to Rheumatological Diseases on Twitter.分析推特上与风湿性疾病相关信息的推文。

Int J Environ Res Public Health. 2021 Aug 28;18(17):9094. doi: 10.3390/ijerph18179094.

Understanding Discussions of Health Issues on Twitter: A Visual Analytic Study.理解推特上关于健康问题的讨论：一项可视化分析研究。

Online J Public Health Inform. 2020 May 16;12(1):e2. doi: 10.5210/ojphi.v12i1.10321. eCollection 2020.

Using Social Media to Track Geographic Variability in Language About Diabetes: Analysis of Diabetes-Related Tweets Across the United States.利用社交媒体追踪糖尿病相关语言的地理变异性：对美国各地与糖尿病相关推文的分析

JMIR Diabetes. 2020 Jan 26;5(1):e14431. doi: 10.2196/14431.

本文引用的文献

Real-time sharing and expression of migraine headache suffering on Twitter: a cross-sectional infodemiology study.偏头痛痛苦在推特上的实时分享与表达：一项横断面信息流行病学研究

J Med Internet Res. 2014 Apr 3;16(4):e96. doi: 10.2196/jmir.3265.

Big data. The parable of Google Flu: traps in big data analysis.大数据。谷歌流感预测的教训：大数据分析中的陷阱。

Science. 2014 Mar 14;343(6176):1203-5. doi: 10.1126/science.1248506.

Twitter: a good place to detect health conditions.推特：一个检测健康状况的好地方。

PLoS One. 2014 Jan 29;9(1):e86191. doi: 10.1371/journal.pone.0086191. eCollection 2014.

Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales.重新评估谷歌流感趋势数据在季节性和大流行性流感检测中的作用：三个地理尺度的比较流行病学研究。

PLoS Comput Biol. 2013;9(10):e1003256. doi: 10.1371/journal.pcbi.1003256. Epub 2013 Oct 17.

Social media in public health.社交媒体在公共卫生中的应用。

Br Med Bull. 2013;108:5-24. doi: 10.1093/bmb/ldt028. Epub 2013 Oct 8.

Scoping review on search queries and social media for disease surveillance: a chronology of innovation.关于疾病监测的搜索查询和社交媒体的范围审查：创新年表

J Med Internet Res. 2013 Jul 18;15(7):e147. doi: 10.2196/jmir.2740.

When Google got flu wrong.当谷歌在流感预测上出错时。

Nature. 2013 Feb 14;494(7436):155-6. doi: 10.1038/494155a.

"5 mins of uncomfyness is better than dealing with cancer 4 a lifetime": an exploratory qualitative analysis of cervical and breast cancer screening dialogue on Twitter.“五分钟的不适好过一辈子与癌症打交道”：对推特上宫颈癌和乳腺癌筛查对话的探索性定性分析

J Cancer Educ. 2013 Mar;28(1):127-33. doi: 10.1007/s13187-012-0432-2.

Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.社交媒体和新闻媒体使人们能够在 2010 年海地霍乱疫情早期估计出疾病的流行模式。

Am J Trop Med Hyg. 2012 Jan;86(1):39-45. doi: 10.4269/ajtmh.2012.11-0597.

OMG U got flu? Analysis of shared health messages for bio-surveillance.天哪，你得了流感？生物监测共享健康信息分析。

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S9. doi: 10.1186/2041-1480-2-S5-S9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用 Twitter 衡量公众对疾病的讨论：一项案例研究。

Using Twitter to Measure Public Discussion of Diseases: A Case Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献