Microeconomics and Social Systems, Yahoo! Research, 111 West 40th Street, New York, NY 10018, USA.
Proc Natl Acad Sci U S A. 2010 Oct 12;107(41):17486-90. doi: 10.1073/pnas.1005962107. Epub 2010 Sep 27.
Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.
最近的研究表明,网络搜索量可以“预测现在”,也就是说,它可以被用来实时、准确地跟踪失业率、汽车和房屋销售、疾病流行等情况。在这里,我们发现消费者在网上搜索的内容也可以提前几天甚至几周预测他们的集体未来行为。具体来说,我们使用搜索查询量来预测故事片的周末票房收入、视频游戏的第一个月销量,以及公告牌百强单曲榜歌曲的排名,在所有情况下,搜索次数都对未来的结果具有高度的预测性。我们还发现,搜索次数通常可以提高基于其他公开可用数据的基准模型的性能,其提升幅度取决于具体应用,从适度到显著不等。最后,我们重新审视了之前用于跟踪流感趋势的工作,并表明,也许令人惊讶的是,搜索数据相对于简单的自回归模型的效用是适度的。我们的结论是,在没有其他数据源的情况下,或者在预测性能的微小改进很重要的情况下,搜索查询是对近期情况的有用指南。