利用搜索引擎大数据预测新的 HIV 诊断。

Using search engine big data for predicting new HIV diagnoses.

机构信息

University of California Institute for Prediction Technology, Department of Family Medicine, University of California Los Angeles, Los Angeles, California, United States of America.

Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

PLoS One. 2018 Jul 12;13(7):e0199527. doi: 10.1371/journal.pone.0199527. eCollection 2018.

DOI:10.1371/journal.pone.0199527

PMID:30001360

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6042696/

Abstract

BACKGROUND

A large and growing body of "big data" is generated by internet search engines, such as Google. Because people often search for information about public health and medical issues, researchers may be able to use search engine data to monitor and predict public health problems, such as HIV. We sought to assess the feasibility of using Google search data to analyze and predict new HIV diagnoses cases in the United States.

METHODS AND FINDINGS

From 2007 to 2014, we collected search volume data on HIV-related Google search keywords across the United States. State-level new HIV diagnoses data were collected from the Centers for Disease Control and Prevention (CDC) and AIDSVu.org. We developed a negative binomial model to predict HIV cases using a subset of significant predictor keywords identified by LASSO. The Google search data were combined with state-level HIV case reports provided by the CDC. We use historical data to train the model and predict new HIV diagnoses from 2011 to 2014, with an average R2 value of 0.99 between predicted versus actual cases, and average root-mean-square error (RMSE) of 108.75.

CONCLUSIONS

Results indicate that Google Trends is a feasible tool to predict new cases of HIV at the state level. We discuss the implications of integrating visualization maps and tools based on these models into public health and HIV monitoring and surveillance.

摘要

背景

互联网搜索引擎（如谷歌）生成了大量且不断增长的“大数据”。由于人们经常搜索有关公共卫生和医疗问题的信息，因此研究人员或许能够使用搜索引擎数据来监测和预测公共卫生问题，例如 HIV。我们旨在评估使用谷歌搜索数据来分析和预测美国新的 HIV 诊断病例的可行性。

方法和发现

从 2007 年到 2014 年，我们在美国范围内收集了与 HIV 相关的谷歌搜索关键字的搜索量数据。从疾病预防控制中心（CDC）和 AIDSVu.org 收集了各州新的 HIV 诊断数据。我们开发了一个负二项式模型，使用通过 LASSO 确定的一组显著预测关键字来预测 HIV 病例。将谷歌搜索数据与由 CDC 提供的各州 HIV 病例报告相结合。我们使用历史数据来训练模型，并预测 2011 年至 2014 年的新 HIV 诊断，预测与实际病例之间的平均 R2 值为 0.99，平均均方根误差（RMSE）为 108.75。

结论

结果表明，谷歌趋势是预测州级新 HIV 病例的可行工具。我们讨论了将这些模型的可视化地图和工具集成到公共卫生和 HIV 监测和监测中的意义。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用搜索引擎大数据预测新的 HIV 诊断。

Using search engine big data for predicting new HIV diagnoses.

机构信息

出版信息

BACKGROUND

METHODS AND FINDINGS

CONCLUSIONS

背景

方法和发现

结论

相似文献

引用本文的文献

本文引用的文献

利用搜索引擎大数据预测新的 HIV 诊断。

Using search engine big data for predicting new HIV diagnoses.

机构信息

出版信息

BACKGROUND

METHODS AND FINDINGS

CONCLUSIONS

背景

方法和发现

结论

相似文献

引用本文的文献

本文引用的文献