利用维基百科进行全球疾病监测与预测。

Global disease monitoring and forecasting with Wikipedia.

作者信息

Generous Nicholas, Fairchild Geoffrey, Deshpande Alina, Del Valle Sara Y, Priedhorsky Reid

机构信息

Defense Systems and Analysis Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

出版信息

PLoS Comput Biol. 2014 Nov 13;10(11):e1003892. doi: 10.1371/journal.pcbi.1003892. eCollection 2014 Nov.

DOI:10.1371/journal.pcbi.1003892

PMID:25392913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4231164/

Abstract

Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data, such as social media and search queries, are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r2 up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.

摘要

传染病是对公共卫生、经济稳定和其他关键社会结构的主要威胁。减轻这些影响的努力依赖于准确及时的监测，以衡量疾病的风险和进展。传统的、以生物学为重点的监测技术准确，但成本高且速度慢；作为回应，基于社交媒体和搜索查询等社会互联网数据的新技术正在兴起。这些努力很有前景，但在科学同行评审、疾病和国家的覆盖范围以及预测等方面的重大挑战阻碍了它们的实际应用。我们研究了一种可免费获取的开放数据源用于此目的：在线百科全书维基百科的访问日志。使用线性模型、以语言作为地点的代理以及系统但简单的文章选择程序，我们测试了14种地点 - 疾病组合，并证明这些数据切实支持一种能够克服这些挑战的方法。具体而言，我们的概念验证产生了决定系数高达0.92的模型、在长达28天的测试期内的预测值，以及几对足够相似的模型，这表明在不重新训练的情况下将模型从一个地点转移到另一个地点是可行的。基于这些初步结果，我们最后提出了一个研究议程，旨在克服这些挑战，并产生一个比当前技术水平显著更有效、更稳健且全球覆盖范围更广的疾病监测和预测系统。

相似文献

Global disease monitoring and forecasting with Wikipedia.

PLoS Comput Biol. 2014 Nov 13;10(11):e1003892. doi: 10.1371/journal.pcbi.1003892. eCollection 2014 Nov.

Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries.

JMIR Public Health Surveill. 2022 Jun 16;8(6):e35266. doi: 10.2196/35266.

Data-model fusion to better understand emerging pathogens and improve infectious disease forecasting.

Ecol Appl. 2011 Jul;21(5):1443-60. doi: 10.1890/09-1409.1.

Forecasting the 2013-2014 influenza season using Wikipedia.

PLoS Comput Biol. 2015 May 14;11(5):e1004239. doi: 10.1371/journal.pcbi.1004239. eCollection 2015 May.

Epidemic Forecasting is Messier Than Weather Forecasting: The Role of Human Behavior and Internet Data Streams in Epidemic Forecast.

J Infect Dis. 2016 Dec 1;214(suppl_4):S404-S408. doi: 10.1093/infdis/jiw375.

Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda.

CSCW Conf Comput Support Coop Work. 2017 Feb-Mar;2017:1812-1834. doi: 10.1145/2998181.2998183.

Digital Disease Surveillance for Emerging Infectious Diseases: An Early Warning System Using the Internet and Social Media Data for COVID-19 Forecasting in Canada.

Stud Health Technol Inform. 2023 May 18;302:861-865. doi: 10.3233/SHTI230290.

Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting Dengue at country and city level.

PLoS Negl Trop Dis. 2017 Jul 18;11(7):e0005729. doi: 10.1371/journal.pntd.0005729. eCollection 2017 Jul.

Forecasting disease risk for increased epidemic preparedness in public health.

Adv Parasitol. 2000;47:309-30. doi: 10.1016/s0065-308x(00)47013-2.

Digital disease detection: A systematic review of event-based internet biosurveillance systems.

Int J Med Inform. 2017 May;101:15-22. doi: 10.1016/j.ijmedinf.2017.01.019. Epub 2017 Feb 8.

引用本文的文献

AI-Enabled Diagnostic Prediction within Electronic Health Records to Enhance Biosurveillance and Early Outbreak Detection.

Res Sq. 2025 Jun 12:rs.3.rs-6606632. doi: 10.21203/rs.3.rs-6606632/v1.

AI-Enabled Diagnostic Prediction within Electronic Health Records to Enhance Biosurveillance and Early Outbreak Detection.

medRxiv. 2025 May 16:2025.05.14.25327606. doi: 10.1101/2025.05.14.25327606.

Changes in Reproductive Health Information-Seeking Behaviors After the Dobbs Decision: Systematic Search of the Wikimedia Database.

JMIR Infodemiology. 2024 Dec 16;4:e64577. doi: 10.2196/64577.

Transforming Disease Surveillance through Artificial Intelligence.

Indian J Community Med. 2024 Sep-Oct;49(5):663-664. doi: 10.4103/ijcm.ijcm_459_24. Epub 2024 Aug 14.

Predicting Norovirus in England Using Existing and Emerging Syndromic Data: Infodemiology Study.

J Med Internet Res. 2023 May 8;25:e37540. doi: 10.2196/37540.

Joint COVID-19 and influenza-like illness forecasts in the United States using internet search information.

Commun Med (Lond). 2023 Mar 24;3(1):39. doi: 10.1038/s43856-023-00272-2.

Internet search data with spatiotemporal analysis in infectious disease surveillance: Challenges and perspectives.

Front Public Health. 2022 Dec 5;10:958835. doi: 10.3389/fpubh.2022.958835. eCollection 2022.

What's hot and what's not in lay psychology: Wikipedia's most-viewed articles.

Curr Psychol. 2022 Oct 12:1-13. doi: 10.1007/s12144-022-03826-0.

Surveillance of emerging infectious diseases for biosecurity.

Sci China Life Sci. 2022 Aug;65(8):1504-1516. doi: 10.1007/s11427-021-2071-x. Epub 2022 Mar 4.

Inclusion of environmentally themed search terms improves Elastic net regression nowcasts of regional Lyme disease rates.

PLoS One. 2022 Mar 10;17(3):e0251165. doi: 10.1371/journal.pone.0251165. eCollection 2022.

本文引用的文献

Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time.

PLoS Comput Biol. 2014 Apr 17;10(4):e1003581. doi: 10.1371/journal.pcbi.1003581. eCollection 2014 Apr.

National and local influenza surveillance through Twitter: an analysis of the 2012-2013 influenza epidemic.

PLoS One. 2013 Dec 9;8(12):e83672. doi: 10.1371/journal.pone.0083672. eCollection 2013.

Correlation between national influenza surveillance data and google trends in South Korea.

PLoS One. 2013 Dec 5;8(12):e81422. doi: 10.1371/journal.pone.0081422. eCollection 2013.

Real-time influenza forecasts during the 2012-2013 season.

Nat Commun. 2013;4:2837. doi: 10.1038/ncomms3837.

Internet search patterns of human immunodeficiency virus and the digital divide in the Russian Federation: infoveillance study.

J Med Internet Res. 2013 Nov 12;15(11):e256. doi: 10.2196/jmir.2936.

Using search queries for malaria surveillance, Thailand.

Malar J. 2013 Nov 4;12:390. doi: 10.1186/1475-2875-12-390.

The complex relationship of realspace events and messages in cyberspace: case study of influenza and pertussis using tweets.

J Med Internet Res. 2013 Oct 24;15(10):e237. doi: 10.2196/jmir.2705.

Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales.

PLoS Comput Biol. 2013;9(10):e1003256. doi: 10.1371/journal.pcbi.1003256. Epub 2013 Oct 17.

Early prediction of movie box office success based on Wikipedia activity big data.

PLoS One. 2013 Aug 21;8(8):e71226. doi: 10.1371/journal.pone.0071226. eCollection 2013.

Quality of information on the Internet about carpal tunnel syndrome: an update.

Orthopedics. 2013 Aug;36(8):e1038-41. doi: 10.3928/01477447-20130724-20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用维基百科进行全球疾病监测与预测。

Global disease monitoring and forecasting with Wikipedia.

作者信息

Generous Nicholas, Fairchild Geoffrey, Deshpande Alina, Del Valle Sara Y, Priedhorsky Reid

机构信息

Defense Systems and Analysis Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

出版信息

PLoS Comput Biol. 2014 Nov 13;10(11):e1003892. doi: 10.1371/journal.pcbi.1003892. eCollection 2014 Nov.

DOI:10.1371/journal.pcbi.1003892

PMID:25392913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4231164/

Abstract

摘要

利用维基百科进行全球疾病监测与预测。

Global disease monitoring and forecasting with Wikipedia.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用维基百科进行全球疾病监测与预测。

Global disease monitoring and forecasting with Wikipedia.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献