Suppr超能文献

谷歌趋势在信息流行病学中的应用:避免不可复制结果和无效结论的方法步骤。

Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions.

机构信息

Redeev SRL, Napoli 80121, Italy.

出版信息

Int J Med Inform. 2024 Oct;190:105563. doi: 10.1016/j.ijmedinf.2024.105563. Epub 2024 Jul 21.

Abstract

BACKGROUND

Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).

OBJECTIVE

The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.

MATERIAL AND METHODS

The Google Topic "Coronavirus disease 2019" has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation "CV%" and its 4-surprisal interval "4-I"). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.

RESULTS

The stability of Google Trends' RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: [8,13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the "interest over time" data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.

CONCLUSIONS

Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.

摘要

背景

Google Trends 是一种广泛用于信息流行病学调查的工具。然而,随机抽样和聚合算法的不规则性会影响相对搜索量(RSV)和区域在线兴趣(ROI)的可靠性。

目的

本研究旨在揭示在使用 Google Trends 进行信息流行病学调查时通常被忽略的方法学关键问题,并提供避免这些缺陷的指南。

材料和方法

使用不同的时间跨度、类别和 IP 地址调查了 Google 主题“2019 年冠状病毒病”。多次手动收集相同的样本以评估 RSV 和 ROI 的稳定性。通过变异性指标(例如百分比变化系数“CV%”及其 4 倍惊喜间隔“4-I”)评估 RSV 和 ROI 的稳定性。通过对 RSV 和 ROI 的定量分析以及对相关查询的定性检查,评估与主题和类别相关的算法的内容聚合能力。

结果

Google Trends 的 RSV 和 ROI 的稳定性不仅与数据集维度或 IP 地址有关。子区域数据集可能非常不稳定(例如,CV%=10,4-I:[8,13])。Google Trends 类别和主题可能会排除相关查询或包含不必要的查询。统计情况与以下假设一致:i)包含查询较少的数据集非常不稳定,ii)“随时间的兴趣”数据格式通常可用于评估趋势和相关性,iii)Google Trends 的改进改变了 RSV 的历史趋势。

结论

只要适当分析和加权网络搜索索引的可靠性以满足科学目标,Google Trends 就可以成为一种有效且高效的信息流行病学工具。本研究中讨论的方法步骤对于得出有效和相关的科学结论至关重要。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验