Department of Biostatistics, Harvard T.H. Chan School of Public Health.
Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation.
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac084.
We propose TWO-SIGMA-G, a competitive gene set test for scRNA-seq data. TWO-SIGMA-G uses a mixed-effects regression model based on our previously published TWO-SIGMA to test for differential expression at the gene-level. This regression-based model provides flexibility and rigor at the gene-level in (1) handling complex experimental designs, (2) accounting for the correlation between biological replicates and (3) accommodating the distribution of scRNA-seq data to improve statistical inference. Moreover, TWO-SIGMA-G uses a novel approach to adjust for inter-gene-correlation (IGC) at the set-level to control the set-level false positive rate. Simulations demonstrate that TWO-SIGMA-G preserves type-I error and increases power in the presence of IGC compared with other methods. Application to two datasets identified HIV-associated interferon pathways in xenograft mice and pathways associated with Alzheimer's disease progression in humans.
我们提出了 TWO-SIGMA-G,这是一种用于 scRNA-seq 数据的竞争性基因集检验方法。TWO-SIGMA-G 使用基于我们之前发表的 TWO-SIGMA 的混合效应回归模型来检验基因水平的差异表达。这种基于回归的模型在(1)处理复杂的实验设计、(2)考虑生物复制之间的相关性和(3)适应 scRNA-seq 数据的分布以提高统计推断方面,在基因水平上提供了灵活性和严谨性。此外,TWO-SIGMA-G 使用一种新的方法来调整基因间相关性(IGC)在集合级别以控制集合级别的假阳性率。模拟表明,与其他方法相比,TWO-SIGMA-G 在存在 IGC 的情况下保留了 I 型错误并提高了功效。将其应用于两个数据集,确定了异种移植小鼠中与 HIV 相关的干扰素途径以及人类阿尔茨海默病进展相关的途径。