利用维基百科衡量全球疾病：成功、失败与研究议程

Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda.

作者信息

Priedhorsky Reid, Osthus Dave, Daughton Ashlynn R, Moran Kelly R, Generous Nicholas, Fairchild Geoffrey, Deshpande Alina, Del Valle Sara Y

机构信息

High Performance Computing (HPC) Division.

Computer, Computational, and Statistical Sciences (CCS) Division.

出版信息

CSCW Conf Comput Support Coop Work. 2017 Feb-Mar;2017:1812-1834. doi: 10.1145/2998181.2998183.

DOI:10.1145/2998181.2998183

PMID:28782059

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5542563/

Abstract

Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of these approaches work remains open. We addressed this question using Wikipedia access logs and category links. Our experiments, replicable and extensible using our open source code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. We found that our minimal-configuration, language-agnostic article selection process based on semantic relatedness is effective for improving predictions, and that our approach is relatively insensitive to the amount and age of training data. We also found, in contrast to prior work, very little forecasting value, and we argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.

摘要

有效的疾病监测为有效的公共卫生系统奠定了基础。在历史上，这是通过患者接触和官僚汇总来实现的，而这往往既缓慢又昂贵。最近基于互联网的方法有望实现实时且低成本，所需参数较少。然而，这些方法是否有效仍然是个未知数。我们利用维基百科访问日志和类别链接解决了这个问题。我们的实验可以使用我们的开源代码和数据进行复制和扩展，通过使用数千个单独的模型，对6种疾病和4个国家进行比较，测试语义文章过滤、训练数据量、预测范围和模型陈旧程度的影响。我们发现，基于语义相关性的最小配置、与语言无关的文章选择过程对于改进预测是有效的，并且我们的方法对训练数据的数量和时效性相对不敏感。与先前的研究结果相反，我们还发现预测价值很小，我们认为这与关于预测本质的理论考量是一致的。这些复杂的结果促使我们提出，当前基于互联网的疾病监测的观察性领域必须转向纳入信息流的理论模型以及基于疾病模拟的对照实验。

相似文献

Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda.

CSCW Conf Comput Support Coop Work. 2017 Feb-Mar;2017:1812-1834. doi: 10.1145/2998181.2998183.

Global disease monitoring and forecasting with Wikipedia.

PLoS Comput Biol. 2014 Nov 13;10(11):e1003892. doi: 10.1371/journal.pcbi.1003892. eCollection 2014 Nov.

Assessing Public Interest Based on Wikipedia's Most Visited Medical Articles During the SARS-CoV-2 Outbreak: Search Trends Analysis.

J Med Internet Res. 2021 Apr 12;23(4):e26331. doi: 10.2196/26331.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks.

PLoS One. 2015 Dec 31;10(12):e0141892. doi: 10.1371/journal.pone.0141892. eCollection 2015.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Utilizing the Wikidata system to improve the quality of medical content in Wikipedia in diverse languages: a pilot study.

J Med Internet Res. 2015 May 5;17(5):e110. doi: 10.2196/jmir.4163.

Wikipedia and medicine: quantifying readership, editors, and the significance of natural language.

J Med Internet Res. 2015 Mar 4;17(3):e62. doi: 10.2196/jmir.4069.

Use of daily Internet search query data improves real-time projections of influenza epidemics.

J R Soc Interface. 2018 Oct 10;15(147):20180220. doi: 10.1098/rsif.2018.0220.

引用本文的文献

A general method for estimating the prevalence of influenza-like-symptoms with Wikipedia data.

PLoS One. 2021 Aug 31;16(8):e0256858. doi: 10.1371/journal.pone.0256858. eCollection 2021.

Surveilling Influenza Incidence With Centers for Disease Control and Prevention Web Traffic Data: Demonstration Using a Novel Dataset.

J Med Internet Res. 2020 Jul 3;22(7):e14337. doi: 10.2196/14337.

Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study.

JMIR Public Health Surveill. 2020 Apr 24;6(2):e14986. doi: 10.2196/14986.

Google Health Trends performance reflecting dengue incidence for the Brazilian states.

BMC Infect Dis. 2020 Mar 26;20(1):252. doi: 10.1186/s12879-020-04957-0.

The Application of Internet-Based Sources for Public Health Surveillance (Infoveillance): Systematic Review.

J Med Internet Res. 2020 Mar 13;22(3):e13680. doi: 10.2196/13680.

The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic.

PLoS Comput Biol. 2020 Mar 12;16(3):e1007633. doi: 10.1371/journal.pcbi.1007633. eCollection 2020 Mar.

Situating Wikipedia as a health information resource in various contexts: A scoping review.

PLoS One. 2020 Feb 18;15(2):e0228786. doi: 10.1371/journal.pone.0228786. eCollection 2020.

Social Media- and Internet-Based Disease Surveillance for Public Health.

Annu Rev Public Health. 2020 Apr 2;41:101-118. doi: 10.1146/annurev-publhealth-040119-094402. Epub 2020 Jan 6.

Estimating influenza incidence using search query deceptiveness and generalized ridge regression.

PLoS Comput Biol. 2019 Oct 1;15(10):e1007165. doi: 10.1371/journal.pcbi.1007165. eCollection 2019 Oct.

Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited.

PLoS Comput Biol. 2019 Feb 1;15(2):e1006599. doi: 10.1371/journal.pcbi.1006599. eCollection 2019 Feb.

本文引用的文献

Eliciting Disease Data from Wikipedia Articles.

Proc Int AAAI Conf Weblogs Soc Media. 2015 May;2015:26-33.

Screening for Pancreatic Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study and Results.

J Oncol Pract. 2016 Aug;12(8):737-44. doi: 10.1200/JOP.2015.010504. Epub 2016 Jun 7.

Zika Virus.

N Engl J Med. 2016 Apr 21;374(16):1552-63. doi: 10.1056/NEJMra1602113. Epub 2016 Mar 30.

Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study.

JMIR Public Health Surveill. 2015 Jan-Jun;1(1):e5. doi: 10.2196/publichealth.4472. Epub 2015 May 29.

Correlation Between UpToDate Searches and Reported Cases of Middle East Respiratory Syndrome During Outbreaks in Saudi Arabia.

Open Forum Infect Dis. 2016 Feb 18;3(1):ofw043. doi: 10.1093/ofid/ofw043. eCollection 2016 Jan.

Why do people google movement disorders? An infodemiological study of information seeking behaviors.

Neurol Sci. 2016 May;37(5):781-7. doi: 10.1007/s10072-016-2501-5. Epub 2016 Feb 4.

The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks.

PLoS One. 2015 Dec 31;10(12):e0141892. doi: 10.1371/journal.pone.0141892. eCollection 2015.

Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes.

Infect Dis Poverty. 2015 Dec 10;4:54. doi: 10.1186/s40249-015-0090-9.

Action Tweets Linked to Reduced County-Level HIV Prevalence in the United States: Online Messages and Structural Determinants.

AIDS Behav. 2016 Jun;20(6):1256-64. doi: 10.1007/s10461-015-1252-2.

Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.

PLoS Comput Biol. 2015 Oct 29;11(10):e1004513. doi: 10.1371/journal.pcbi.1004513. eCollection 2015 Oct.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用维基百科衡量全球疾病：成功、失败与研究议程

Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献