Harvard Medical School, 25 Shattuck Street, Boston 02115, MA, USA.
Malar J. 2013 Nov 4;12:390. doi: 10.1186/1475-2875-12-390.
Internet search query trends have been shown to correlate with incidence trends for select infectious diseases and countries. Herein, the first use of Google search queries for malaria surveillance is investigated. The research focuses on Thailand where real-time malaria surveillance is crucial as malaria is re-emerging and developing resistance to pharmaceuticals in the region.
Official Thai malaria case data was acquired from the World Health Organization (WHO) from 2005 to 2009. Using Google correlate, an openly available online tool, and by surveying Thai physicians, search queries potentially related to malaria prevalence were identified. Four linear regression models were built from different sub-sets of malaria-related queries to be used in future predictions. The models' accuracies were evaluated by their ability to predict the malaria outbreak in 2009, their correlation with the entire available malaria case data, and by Akaike information criterion (AIC).
Each model captured the bulk of the variability in officially reported malaria incidence. Correlation in the validation set ranged from 0.75 to 0.92 and AIC values ranged from 808 to 586 for the models. While models using malaria-related and general health terms were successful, one model using only microscopy-related terms obtained equally high correlations to malaria case data trends. The model built strictly of queries provided by Thai physicians was the only one that consistently captured the well-documented second seasonal malaria peak in Thailand.
Models built from Google search queries were able to adequately estimate malaria activity trends in Thailand, from 2005-2010, according to official malaria case counts reported by WHO. While presenting their own limitations, these search queries may be valid real-time indicators of malaria incidence in the population, as correlations were on par with those of related studies for other infectious diseases. Additionally, this methodology provides a cost-effective description of malaria prevalence that can act as a complement to traditional public health surveillance. This and future studies will continue to identify ways to leverage web-based data to improve public health.
互联网搜索查询趋势已被证明与某些传染病和国家的发病率趋势相关。在此,首次调查了使用谷歌搜索查询进行疟疾监测的情况。该研究集中在泰国,由于该地区疟疾重新出现并对药物产生抗药性,因此实时疟疾监测至关重要。
从世界卫生组织(WHO)获取了 2005 年至 2009 年的泰国官方疟疾病例数据。使用谷歌关联度(一种公开可用的在线工具)和对泰国医生进行调查,确定了与疟疾流行率相关的搜索查询。从不同的疟疾相关查询子集中构建了四个线性回归模型,用于未来的预测。通过其预测 2009 年疟疾爆发的能力、与整个可用疟疾病例数据的相关性以及赤池信息量准则(AIC)来评估模型的准确性。
每个模型都捕捉到了官方报告的疟疾发病率的大部分变化。验证集中的相关性范围为 0.75 至 0.92,AIC 值范围为 808 至 586。虽然使用疟疾相关和一般健康术语的模型取得了成功,但仅使用显微镜相关术语的模型与疟疾病例数据趋势的相关性也很高。仅由泰国医生提供的查询构建的模型是唯一能够持续捕捉到泰国记录在案的第二个季节性疟疾高峰的模型。
根据世界卫生组织报告的官方疟疾病例数,使用谷歌搜索查询构建的模型能够充分估计 2005-2010 年泰国的疟疾活动趋势。虽然存在自身的局限性,但这些搜索查询可能是人群中疟疾发病率的有效实时指标,因为其相关性与其他传染病的相关研究相当。此外,这种方法提供了一种具有成本效益的疟疾流行率描述,可以作为传统公共卫生监测的补充。本研究和未来的研究将继续寻找利用基于网络的数据来改善公共卫生的方法。