Li Xuechan, Sung Anthony D, Xie Jichun
Department of Biostatistics, Duke University, Durham, NC 27705, USA.
Department of Medicine, Duke University, Durham, NC 27705, USA.
J Mach Learn Res. 2023;24.
Multiple testing is a commonly used tool in modern data science. Sometimes, the hypotheses are embedded in a space; the distances between the hypotheses reflect their co-null/co-alternative patterns. Properly incorporating the distance information in testing will boost testing power. Hence, we developed a new multiple testing framework named Distance Assisted Recursive Testing (DART). DART features in joint artificial intelligence (AI) and statistics modeling. It has two stages. The first stage uses AI models to construct an aggregation tree that reflects the distance information. The second stage uses statistical models to embed the testing on the tree and control the false discovery rate. Theoretical analysis and numerical experiments demonstrated that DART generates valid, robust, and powerful results. We applied DART to a clinical trial in the allogeneic stem cell transplantation study to identify the gut microbiota whose abundance was impacted by post-transplant care.
多重检验是现代数据科学中常用的工具。有时,假设被嵌入到一个空间中;假设之间的距离反映了它们的共同零假设/共同备择假设模式。在检验中适当地纳入距离信息将提高检验功效。因此,我们开发了一种名为距离辅助递归检验(DART)的新的多重检验框架。DART的特点是结合了人工智能(AI)和统计建模。它有两个阶段。第一阶段使用人工智能模型构建一个反映距离信息的聚合树。第二阶段使用统计模型将检验嵌入到树上并控制错误发现率。理论分析和数值实验表明,DART产生了有效、稳健且强大的结果。我们将DART应用于异基因干细胞移植研究中的一项临床试验,以识别其丰度受移植后护理影响的肠道微生物群。