Department of Mathematics and Statistics, Williams College, Williamstown, MA, United States of America.
Department of Mathematics, Wellesley College, Wellesley, MA, United States of America.
PLoS One. 2024 Aug 26;19(8):e0305579. doi: 10.1371/journal.pone.0305579. eCollection 2024.
Big data collected from the Internet possess great potential to reveal the ever-changing trends in society. In particular, accurate infectious disease tracking with Internet data has grown in popularity, providing invaluable information for public health decision makers and the general public. However, much of the complex connectivity among the Internet search data is not effectively addressed among existing disease tracking frameworks. To this end, we propose ARGO-C (Augmented Regression with Clustered GOogle data), an integrative, statistically principled approach that incorporates the clustering structure of Internet search data to enhance the accuracy and interpretability of disease tracking. Focusing on multi-resolution %ILI (influenza-like illness) tracking, we demonstrate the improved performance and robustness of ARGO-C over benchmark methods at various geographical resolutions. We also highlight the adaptability of ARGO-C to track various diseases in addition to influenza, and to track other social or economic trends.
从互联网收集的大数据具有揭示社会不断变化的趋势的巨大潜力。特别是,使用互联网数据进行准确的传染病跟踪已经越来越受欢迎,为公共卫生决策者和公众提供了宝贵的信息。然而,在现有的疾病跟踪框架中,并没有有效地解决互联网搜索数据之间复杂的连接性问题。为此,我们提出了 ARGO-C(带有聚类 GOogle 数据的增强回归),这是一种综合的、基于统计学原理的方法,它将互联网搜索数据的聚类结构纳入其中,以提高疾病跟踪的准确性和可解释性。我们专注于多分辨率 %ILI(流感样疾病)跟踪,展示了 ARGO-C 在各种地理分辨率下相对于基准方法的改进性能和稳健性。我们还强调了 ARGO-C 不仅可以用于跟踪流感等各种疾病,还可以用于跟踪其他社会或经济趋势的适应性。