Best Ana F, Malinovsky Yaakov, Albert Paul S
Biostatistics Branch, Biometrics Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Department of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, MD, USA.
J Appl Stat. 2022 May 9;50(10):2228-2245. doi: 10.1080/02664763.2022.2071419. eCollection 2023.
Group testing study designs have been used since the 1940s to reduce screening costs for uncommon diseases; for rare diseases, all cases are identifiable with substantially fewer tests than the population size. Substantial research has identified efficient designs under this paradigm. However, little work has focused on the important problem of disease screening among clustered data, such as geographic heterogeneity in HIV prevalence. We evaluated designs where we first estimate disease prevalence and then apply efficient group testing algorithms using these estimates. Specifically, we evaluate prevalence using individual testing on a fixed-size subset of each cluster and use these prevalence estimates to choose group sizes that minimize the corresponding estimated average number of tests per subject. We compare designs where we estimate cluster-specific prevalences as well as a common prevalence across clusters, use different group testing algorithms, construct groups from individuals within and in different clusters, and consider misclassification. For diseases with low prevalence, our results suggest that accounting for clustering is unnecessary. However, for diseases with higher prevalence and sizeable between-cluster heterogeneity, accounting for clustering in study design and implementation improves efficiency. We consider the practical aspects of our design recommendations with two examples with strong clustering effects: (1) Identification of HIV carriers in the US population and (2) Laboratory screening of anti-cancer compounds using cell lines.
自20世纪40年代以来,分组检测研究设计就被用于降低罕见病的筛查成本;对于罕见病而言,通过比总体规模少得多的检测就能识别出所有病例。大量研究已经确定了该范式下的高效设计。然而,很少有工作关注聚类数据中的疾病筛查这一重要问题,比如艾滋病毒流行率的地理异质性。我们评估了这样的设计:首先估计疾病流行率,然后使用这些估计值应用高效的分组检测算法。具体而言,我们通过对每个聚类的固定规模子集进行个体检测来评估流行率,并使用这些流行率估计值来选择组规模,以使每个受试者相应的估计平均检测次数最小化。我们比较了以下几种设计:估计聚类特定的流行率以及跨聚类的共同流行率,使用不同的分组检测算法,从聚类内和不同聚类中的个体构建组,并考虑错误分类。对于低流行率的疾病,我们的结果表明无需考虑聚类。然而,对于流行率较高且聚类间异质性较大的疾病,在研究设计和实施中考虑聚类可提高效率。我们用两个具有强烈聚类效应的例子来考虑我们设计建议的实际方面:(1)在美国人群中识别艾滋病毒携带者,以及(2)使用细胞系对抗癌化合物进行实验室筛查。