Twitter, 1355 Market, St. San Francisco, CA, USA.
Microsoft, One Microsoft Way, Redmond, WA, USA.
Nat Commun. 2021 Jan 8;12(1):194. doi: 10.1038/s41467-020-20206-z.
While digital trace data from sources like search engines hold enormous potential for tracking and understanding human behavior, these streams of data lack information about the actual experiences of those individuals generating the data. Moreover, most current methods ignore or under-utilize human processing capabilities that allow humans to solve problems not yet solvable by computers (human computation). We demonstrate how behavioral research, linking digital and real-world behavior, along with human computation, can be utilized to improve the performance of studies using digital data streams. This study looks at the use of search data to track prevalence of Influenza-Like Illness (ILI). We build a behavioral model of flu search based on survey data linked to users' online browsing data. We then utilize human computation for classifying search strings. Leveraging these resources, we construct a tracking model of ILI prevalence that outperforms strong historical benchmarks using only a limited stream of search data and lends itself to tracking ILI in smaller geographic units. While this paper only addresses searches related to ILI, the method we describe has potential for tracking a broad set of phenomena in near real-time.
虽然搜索引擎等来源的数字痕迹数据在追踪和理解人类行为方面具有巨大潜力,但这些数据流缺乏有关生成数据的个人实际体验的信息。此外,大多数当前的方法忽略或未充分利用人类处理能力,而人类处理能力可以解决计算机尚未解决的问题(人类计算)。我们展示了如何利用行为研究,将数字和现实世界的行为与人类计算结合起来,以提高使用数字数据流进行的研究的性能。本研究着眼于利用搜索数据来追踪流感样疾病(ILI)的流行情况。我们根据与用户在线浏览数据相关联的调查数据,构建了流感搜索的行为模型。然后,我们利用人类计算来对搜索字符串进行分类。利用这些资源,我们构建了一个 ILI 流行率的跟踪模型,该模型仅使用有限的搜索数据流就优于强大的历史基准,并且可以在较小的地理区域内跟踪 ILI。虽然本文仅涉及与 ILI 相关的搜索,但我们描述的方法具有在接近实时的情况下跟踪广泛现象的潜力。