基于数据驱动将传染病发病率按年龄组进行聚类。

Data-driven clustering of infectious disease incidence into age groups.

作者信息

Yaari Rami, Huppert Amit, Dattner Itai

机构信息

Bio-statistical and Bio-mathematical Unit, The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, Tel Hashomer, Israel.

School of Public Health, the Sackler Faculty of Medicine, Tel-Aviv University, Tel Aviv, Israel.

出版信息

Stat Methods Med Res. 2022 Dec;31(12):2486-2499. doi: 10.1177/09622802221129041. Epub 2022 Oct 11.

DOI:10.1177/09622802221129041

PMID:36217843

Abstract

Understanding the patterns of infectious diseases spread in the population is an important element of mitigation and vaccination programs. A major and common characteristic of most infectious diseases is age-related heterogeneity in the transmission, which potentially can affect the dynamics of an epidemic as manifested by the pattern of disease incidence in different age groups. Currently there are no statistical criteria of how to partition the disease incidence data into clusters. We develop the first data-driven methodology for deciding on the best partition of incidence data into age-groups, in a well defined statistical sense. The method employs a top-down hierarchical partitioning algorithm, with a stopping criteria based on multiple hypotheses significance testing controlling the family wise error rate. The type one error and statistical power of the method are tested using simulations. The method is then applied to Covid-19 incidence data in Israel, in order to extract the significant age-group clusters in each wave of the epidemic.

摘要

了解传染病在人群中的传播模式是缓解措施和疫苗接种计划的重要组成部分。大多数传染病的一个主要且常见的特征是传播过程中与年龄相关的异质性，这可能会影响疫情动态，不同年龄组的疾病发病率模式就体现了这一点。目前，对于如何将疾病发病率数据划分为不同类别尚无统计标准。我们开发了第一种数据驱动的方法，用于在明确的统计意义上确定将发病率数据最佳划分为不同年龄组的方式。该方法采用自上而下的分层划分算法，其停止标准基于控制家族性错误率的多重假设显著性检验。通过模拟测试了该方法的一类错误和统计功效。然后将该方法应用于以色列的新冠疫情发病率数据，以提取疫情每一波中的显著年龄组类别。