Am J Epidemiol. 2023 Feb 24;192(3):430-437. doi: 10.1093/aje/kwac171.
Interest in using internet search data, such as that from the Google Health Trends Application Programming Interface (GHT-API), to measure epidemiologically relevant exposures or health outcomes is growing due to their accessibility and timeliness. Researchers enter search term(s), geography, and time period, and the GHT-API returns a scaled probability of that search term, given all searches within the specified geographic-time period. In this study, we detailed a method for using these data to measure a construct of interest in 5 iterative steps: first, identify phrases the target population may use to search for the construct of interest; second, refine candidate search phrases with incognito Google searches to improve sensitivity and specificity; third, craft the GHT-API search term(s) by combining the refined phrases; fourth, test search volume and choose geographic and temporal scales; and fifth, retrieve and average multiple samples to stabilize estimates and address missingness. An optional sixth step involves accounting for changes in total search volume by normalizing. We present a case study examining weekly state-level child abuse searches in the United States during the coronavirus disease 2019 pandemic (January 2018 to August 2020) as an application of this method and describe limitations.
由于互联网搜索数据(如谷歌健康趋势应用程序编程接口 (GHT-API))的可及性和及时性,人们越来越感兴趣地将其用于测量与流行病学相关的暴露或健康结果。研究人员输入搜索词、地理位置和时间段,GHT-API 返回给定特定地理-时间范围内所有搜索的搜索词的缩放概率。在这项研究中,我们详细介绍了一种使用这些数据在 5 个迭代步骤中测量感兴趣的构念的方法:首先,识别目标人群可能用于搜索感兴趣的构念的短语;其次,使用隐身谷歌搜索来改进候选搜索短语的敏感性和特异性;第三,通过组合精炼短语来制作 GHT-API 搜索词;第四,测试搜索量并选择地理和时间尺度;第五,检索并平均多个样本以稳定估计值并解决缺失值。第六个可选步骤涉及通过归一化来考虑总搜索量的变化。我们展示了一个案例研究,检查了 2019 年冠状病毒病(2018 年 1 月至 2020 年 8 月)期间美国每周的州级儿童虐待搜索,作为该方法的应用,并描述了其局限性。