Suppr超能文献

谷歌趋势的可靠性:新冠疫情期间及未来研究中网络信息监测的局限性与潜力分析

Reliability of Google Trends: Analysis of the Limits and Potential of Web Infoveillance During COVID-19 Pandemic and for Future Research.

作者信息

Rovetta Alessandro

机构信息

Research and Disclosure Division, Mensana srls, Brescia, Italy.

Technological and Scientific Research, Redeev srl, Napoli, Italy.

出版信息

Front Res Metr Anal. 2021 May 25;6:670226. doi: 10.3389/frma.2021.670226. eCollection 2021.

Abstract

Alongside the COVID-19 pandemic, government authorities around the world have had to face a growing infodemic capable of causing serious damages to public health and economy. In this context, the use of infoveillance tools has become a primary necessity. The aim of this study is to test the reliability of a widely used infoveillance tool which is Google Trends. In particular, the paper focuses on the analysis of relative search volumes (RSVs) quantifying their dependence on the day they are collected. RSVs of the query + during February 1-December 4, 2020 (period 1), and February 20-May 18, 2020 (period 2), were collected daily by Google Trends from December 8 to 27, 2020. The survey covered Italian regions and cities, and countries and cities worldwide. The search category was set to all categories. Each dataset was analyzed to observe any dependencies of RSVs from the day they were gathered. To do this, by calling the country, region, or city under investigation and the day its was collected, a Gaussian distribution was used to represent the trend of daily variations of . When a missing value was revealed (anomaly), the affected country, region or city was excluded from the analysis. When the anomalies exceeded 20% of the sample size, the whole sample was excluded from the statistical analysis. Pearson and Spearman correlations between RSVs and the number of COVID-19 cases were calculated day by day thus to highlight any variations related to the day RSVs were collected. Welch's t-test was used to assess the statistical significance of the differences between the average RSVs of the various countries, regions, or cities of a given dataset. Two RSVs were considered statistical confident when . A dataset was deemed unreliable if the confident data exceeded 20% (confidence threshold). The percentage increase was used to quantify the difference between two values. Google Trends has been subject to an acceptable quantity of anomalies only as regards the RSVs of Italian regions (0% in both periods 1 and 2) and countries worldwide (9.7% during period 1 and 10.9% during period 2). However, the correlations between RSVs and COVID-19 cases underwent significant variations even in these two datasets ( for Italian regions, and for countries worldwide). Furthermore, only RSVs of countries worldwide did not exceed confidence threshold. Finally, the large amount of anomalies registered in Italian and international cities' RSVs made these datasets unusable for any kind of statistical inference. In the considered timespans, Google Trends has proved to be reliable only for surveys concerning RSVs of countries worldwide. Since RSVs values showed a high dependence on the day they were gathered, it is essential for future research that the authors collect queries' data for several consecutive days and work with their RSVs averages instead of daily RSVs, trying to minimize the standard errors until an established confidence threshold is respected. Further research is needed to evaluate the effectiveness of this method.

摘要

在新冠疫情期间,世界各地的政府当局不得不面对日益严重的信息疫情,这种信息疫情可能对公众健康和经济造成严重损害。在此背景下,使用信息监测工具已成为首要需求。本研究的目的是测试一种广泛使用的信息监测工具——谷歌趋势的可靠性。具体而言,本文重点分析相对搜索量(RSV),量化其对收集日期的依赖性。2020年12月8日至27日,谷歌趋势每天收集2020年2月1日至12月4日(时期1)以及2020年2月20日至5月18日(时期2)期间查询“+”的RSV。调查覆盖了意大利各地区和城市以及全球各国和城市。搜索类别设置为所有类别。对每个数据集进行分析,以观察RSV对收集日期的任何依赖性。为此,通过调用受调查的国家、地区或城市以及其RSV收集日期,使用高斯分布来表示RSV的每日变化趋势。当发现缺失值(异常)时,受影响的国家、地区或城市被排除在分析之外。当异常值超过样本量的20%时,整个样本被排除在统计分析之外。每天计算RSV与新冠病例数之间的皮尔逊和斯皮尔曼相关性,以突出与RSV收集日期相关的任何变化。使用韦尔奇t检验来评估给定数据集中不同国家、地区或城市的平均RSV之间差异的统计显著性。当 时,两个RSV被认为具有统计置信度。如果置信数据超过20%(置信阈值),则数据集被视为不可靠。使用百分比增长来量化两个值之间的差异。仅在意大利各地区(时期1和时期2均为0%)以及全球各国(时期1为9.7%,时期2为10.9%)的RSV方面,谷歌趋势出现的异常数量可接受。然而,即使在这两个数据集中,RSV与新冠病例之间的相关性也发生了显著变化(意大利各地区为 ,全球各国为 )。此外,只有全球各国的RSV未超过置信阈值。最后,意大利和国际城市RSV中记录的大量异常使得这些数据集无法用于任何类型的统计推断。在考虑的时间跨度内,谷歌趋势仅在涉及全球各国RSV的调查中被证明是可靠的。由于RSV值对收集日期高度依赖,对于未来的研究而言,作者连续几天收集查询数据并使用RSV平均值而非每日RSV进行研究至关重要,要尽量减少标准误差,直到达到既定的置信阈值。需要进一步研究来评估此方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/887f/8186442/927978886149/frma-06-670226-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验