• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用人群搜索行为进行生活方式疾病监测:可行性研究。

Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study.

作者信息

Memon Shahan Ali, Razak Saquib, Weber Ingmar

机构信息

Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States.

Carnegie Mellon University, Doha, Qatar.

出版信息

J Med Internet Res. 2020 Jan 27;22(1):e13347. doi: 10.2196/13347.

DOI:10.2196/13347
PMID:32012050
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7011125/
Abstract

BACKGROUND

As the process of producing official health statistics for lifestyle diseases is slow, researchers have explored using Web search data as a proxy for lifestyle disease surveillance. Existing studies, however, are prone to at least one of the following issues: ad-hoc keyword selection, overfitting, insufficient predictive evaluation, lack of generalization, and failure to compare against trivial baselines.

OBJECTIVE

The aims of this study were to (1) employ a corrective approach improving previous methods; (2) study the key limitations in using Google Trends for lifestyle disease surveillance; and (3) test the generalizability of our methodology to other countries beyond the United States.

METHODS

For each of the target variables (diabetes, obesity, and exercise), prevalence rates were collected. After a rigorous keyword selection process, data from Google Trends were collected. These data were denormalized to form spatio-temporal indices. L1-regularized regression models were trained to predict prevalence rates from denormalized Google Trends indices. Models were tested on a held-out set and compared against baselines from the literature as well as a trivial last year equals this year baseline. A similar analysis was done using a multivariate spatio-temporal model where the previous year's prevalence was included as a covariate. This model was modified to create a time-lagged regression analysis framework. Finally, a hierarchical time-lagged multivariate spatio-temporal model was created to account for subnational trends in the data. The model trained on US data was, then, applied in a transfer learning framework to Canada.

RESULTS

In the US context, our proposed models beat the performances of the prior work, as well as the trivial baselines. In terms of the mean absolute error (MAE), the best of our proposed models yields 24% improvement (0.72-0.55; P<.001) for diabetes; 18% improvement (1.20-0.99; P=.001) for obesity, and 34% improvement (2.89-1.95; P<.001) for exercise. Our proposed across-country transfer learning framework also shows promising results with an average Spearman and Pearson correlation of 0.70 for diabetes and 0.90 and 0.91 for obesity, respectively.

CONCLUSIONS

Although our proposed models beat the baselines, we find the modeling of lifestyle diseases to be a challenging problem, one that requires an abundance of data as well as creative modeling strategies. In doing so, this study shows a low-to-moderate validity of Google Trends in the context of lifestyle disease surveillance, even when applying novel corrective approaches, including a proposed denormalization scheme. We envision qualitative analyses to be a more practical use of Google Trends in the context of lifestyle disease surveillance. For the quantitative analyses, the highest utility of using Google Trends is in the context of transfer learning where low-resource countries could benefit from high-resource countries by using proxy models.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdee/7011125/3f20ab60ee36/jmir_v22i1e13347_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdee/7011125/97e5e5ac3832/jmir_v22i1e13347_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdee/7011125/3f20ab60ee36/jmir_v22i1e13347_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdee/7011125/97e5e5ac3832/jmir_v22i1e13347_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdee/7011125/3f20ab60ee36/jmir_v22i1e13347_fig2.jpg
摘要

背景

由于生成生活方式疾病官方健康统计数据的过程缓慢,研究人员已探索使用网络搜索数据作为生活方式疾病监测的替代指标。然而,现有研究至少容易出现以下问题之一:临时关键词选择、过度拟合、预测评估不足、缺乏普遍性以及未能与简单基线进行比较。

目的

本研究的目的是:(1)采用一种改进先前方法的校正方法;(2)研究使用谷歌趋势进行生活方式疾病监测的关键局限性;(3)测试我们方法在美国以外其他国家的通用性。

方法

针对每个目标变量(糖尿病、肥胖症和运动)收集患病率数据。经过严格的关键词选择过程后,收集来自谷歌趋势的数据。这些数据进行反归一化以形成时空指数。训练L1正则化回归模型,根据反归一化的谷歌趋势指数预测患病率。在一个留出的数据集上对模型进行测试,并与文献中的基线以及一个简单的去年等于今年的基线进行比较。使用多元时空模型进行类似分析,其中将上一年的患病率作为协变量纳入。对该模型进行修改以创建时间滞后回归分析框架。最后,创建一个分层时间滞后多元时空模型以考虑数据中的次国家趋势。然后,将在美国数据上训练的模型应用于迁移学习框架中的加拿大。

结果

在美国的背景下,我们提出的模型优于先前工作的性能以及简单基线。就平均绝对误差(MAE)而言,我们提出的最佳模型在糖尿病方面提高了24%(0.72 - 0.55;P <.001);在肥胖症方面提高了18%(1.20 - 0.99;P =.001),在运动方面提高了34%(2.89 - 1.95;P <.001)。我们提出的跨国迁移学习框架也显示出有希望的结果,糖尿病的平均斯皮尔曼和皮尔逊相关性分别为0.70,肥胖症的分别为0.90和0.91。

结论

尽管我们提出的模型优于基线,但我们发现生活方式疾病的建模是一个具有挑战性的问题,需要大量数据以及创造性的建模策略。在此过程中,本研究表明,即使应用新颖的校正方法,包括提出的反归一化方案,谷歌趋势在生活方式疾病监测背景下的有效性也较低至中等。我们设想定性分析在生活方式疾病监测背景下对谷歌趋势的使用更具实用性。对于定量分析,使用谷歌趋势的最大效用在于迁移学习背景下,资源匮乏的国家可以通过使用代理模型从资源丰富的国家受益。

相似文献

1
Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study.利用人群搜索行为进行生活方式疾病监测:可行性研究。
J Med Internet Res. 2020 Jan 27;22(1):e13347. doi: 10.2196/13347.
2
Forecasting the COVID-19 Epidemic by Integrating Symptom Search Behavior Into Predictive Models: Infoveillance Study.将症状搜索行为纳入预测模型预测 COVID-19 疫情:信息监测研究。
J Med Internet Res. 2021 Aug 11;23(8):e28876. doi: 10.2196/28876.
3
Correlation between Google Trends on dengue fever and national surveillance report in Indonesia.印度尼西亚登革热谷歌趋势与国家监测报告之间的相关性。
Glob Health Action. 2019;12(1):1552652. doi: 10.1080/16549716.2018.1552652.
4
Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.美国东部地区遥感气溶胶光学厚度与PM2.5之间关系的评估及统计建模
Res Rep Health Eff Inst. 2012 May(167):5-83; discussion 85-91.
5
Google Trends in Infodemiology and Infoveillance: Methodology Framework.信息流行病学与信息监测中的谷歌趋势:方法框架。
JMIR Public Health Surveill. 2019 May 29;5(2):e13439. doi: 10.2196/13439.
6
Disease Monitoring and Health Campaign Evaluation Using Google Search Activities for HIV and AIDS, Stroke, Colorectal Cancer, and Marijuana Use in Canada: A Retrospective Observational Study.利用谷歌搜索活动对加拿大艾滋病毒和艾滋病、中风、结直肠癌及大麻使用情况进行疾病监测与健康运动评估:一项回顾性观察研究
JMIR Public Health Surveill. 2016 Oct 12;2(2):e156. doi: 10.2196/publichealth.6504.
7
Using Search Engine Data as a Tool to Predict Syphilis.利用搜索引擎数据预测梅毒
Epidemiology. 2018 Jul;29(4):574-578. doi: 10.1097/EDE.0000000000000836.
8
Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes.评估与埃博拉相关的网络搜索行为:基于谷歌趋势查询量的分析研究的见解与启示
Infect Dis Poverty. 2015 Dec 10;4:54. doi: 10.1186/s40249-015-0090-9.
9
Integrating Google Trends Search Engine Query Data Into Adult Emergency Department Volume Forecasting: Infodemiology Study.将谷歌趋势搜索引擎查询数据整合到成人急诊科就诊量预测中:信息流行病学研究。
JMIR Infodemiology. 2022 Apr 22;2(1):e32386. doi: 10.2196/32386. eCollection 2022 Jan-Jun.
10
Real-time Prediction of the Daily Incidence of COVID-19 in 215 Countries and Territories Using Machine Learning: Model Development and Validation.利用机器学习实时预测 215 个国家和地区的每日 COVID-19 发病率:模型开发和验证。
J Med Internet Res. 2021 Jun 14;23(6):e24285. doi: 10.2196/24285.

引用本文的文献

1
Explanation of hand, foot, and mouth disease cases in Japan using Google Trends before and during the COVID-19: infodemiology study.在COVID-19疫情之前及期间利用谷歌趋势对日本手足口病病例进行的解释:信息流行病学研究
BMC Infect Dis. 2022 Oct 29;22(1):806. doi: 10.1186/s12879-022-07790-9.
2
Periodic Trends in Internet Searches for Ocular Symptoms in the US.美国眼部症状的互联网搜索周期性趋势。
Ophthalmic Epidemiol. 2023 Aug;30(4):352-357. doi: 10.1080/09286586.2022.2119260. Epub 2022 Sep 14.
3
Using Application Programming Interfaces to Access Google Data for Health Research: Protocol for a Methodological Framework.

本文引用的文献

1
Digital behavior surveillance: Monitoring dental caries and toothache interests of Google users from developing countries.数字行为监测:监测发展中国家谷歌用户的龋齿和牙痛兴趣。
Oral Dis. 2019 Jan;25(1):339-347. doi: 10.1111/odi.12986. Epub 2018 Oct 17.
2
What Can Google Inform Us about People's Interests regarding Dental Caries in Different Populations?谷歌能告诉我们不同人群对龋齿的兴趣有哪些?
Caries Res. 2018;52(3):177-188. doi: 10.1159/000485107. Epub 2018 Jan 20.
3
Relationship Between State-Level Google Online Search Volume and Cancer Incidence in the United States: Retrospective Study.
使用应用程序编程接口访问谷歌数据用于健康研究:方法框架协议
JMIR Res Protoc. 2020 Jul 6;9(7):e16543. doi: 10.2196/16543.
美国州级谷歌在线搜索量与癌症发病率之间的关系:回顾性研究
J Med Internet Res. 2018 Jan 8;20(1):e6. doi: 10.2196/jmir.8870.
4
Google and suicides: what can we learn about the use of internet to prevent suicides?谷歌与自杀:关于利用互联网预防自杀我们能了解到什么?
Public Health. 2018 Jan;154:144-150. doi: 10.1016/j.puhe.2017.10.016. Epub 2017 Dec 22.
5
Suicide rates and information seeking via search engines: A cross-national correlational approach.自杀率与通过搜索引擎获取信息:跨国相关性研究方法。
Death Stud. 2018 Sep;42(8):508-512. doi: 10.1080/07481187.2017.1388305. Epub 2018 Jan 24.
6
Analysis of the interests of Google users on toothache information.谷歌用户对牙痛信息的兴趣分析。
PLoS One. 2017 Oct 19;12(10):e0186059. doi: 10.1371/journal.pone.0186059. eCollection 2017.
7
Low validity of Google Trends for behavioral forecasting of national suicide rates.谷歌趋势对全国自杀率行为预测的有效性较低。
PLoS One. 2017 Aug 16;12(8):e0183149. doi: 10.1371/journal.pone.0183149. eCollection 2017.
8
Is Google Trends a reliable tool for digital epidemiology? Insights from different clinical settings.谷歌趋势是否是数字流行病学的可靠工具?来自不同临床环境的见解。
J Epidemiol Glob Health. 2017 Sep;7(3):185-189. doi: 10.1016/j.jegh.2017.06.001. Epub 2017 Jun 9.
9
Using Search Engines to Investigate Shared Migraine Experiences.使用搜索引擎调查偏头痛的共同经历。
Headache. 2017 Sep;57(8):1217-1227. doi: 10.1111/head.13130. Epub 2017 Jun 28.
10
Correlation Among Cancer Incidence and Mortality Rates and Internet Searches in the United States.美国癌症发病率、死亡率与互联网搜索之间的相关性。
JAMA Dermatol. 2017 Sep 1;153(9):911-914. doi: 10.1001/jamadermatol.2017.1870.