School of Mathematics, Jilin University, Changchun, Jilin 130012, China.
Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Genetics. 2022 Jul 30;221(4). doi: 10.1093/genetics/iyac095.
Effective control of false discovery rate is key for multiplicity problems. Here, we consider incorporating informative covariates from external datasets in the multiple testing procedure to boost statistical power while maintaining false discovery rate control. In particular, we focus on the statistical analysis of innovative high-dimensional spatial transcriptomic data while incorporating external multiomics data that provide distinct but complementary information to the detection of spatial expression patterns. We extend OrderShapeEM, an efficient covariate-assisted multiple testing procedure that incorporates one auxiliary study, to make it permissible to incorporate multiple external omics studies, to boost statistical power of spatial expression pattern detection. Specifically, we first use a recently proposed computationally efficient statistical analysis method, spatial pattern recognition via kernels, to produce the primary test statistics for spatial transcriptomic data. Afterwards, we construct the auxiliary covariate by combining information from multiple external omics studies, such as bulk and single-cell RNA-seq data using the Cauchy combination rule. Finally, we extend and implement the integrative analysis method OrderShapeEM on the primary P-values along with auxiliary data incorporating multiomics information for efficient covariate-assisted spatial expression analysis. We conduct a series of realistic simulations to evaluate the performance of our method with known ground truth. Four case studies in mouse olfactory bulb, mouse cerebellum, human breast cancer, and human heart tissues further demonstrate the substantial power gain of our method in detecting genes with spatial expression patterns compared to existing classic approaches that do not utilize any external information.
有效的虚假发现率控制是解决多重性问题的关键。在这里,我们考虑在多重检验过程中纳入来自外部数据集的信息性协变量,以在保持虚假发现率控制的同时提高统计功效。特别是,我们专注于对创新的高维空间转录组学数据进行统计分析,同时纳入提供独特但互补信息的外部多组学数据,以检测空间表达模式。我们扩展了 OrderShapeEM,这是一种有效的协变量辅助多重检验程序,该程序可以纳入一个辅助研究,以允许纳入多个外部组学研究,从而提高空间表达模式检测的统计功效。具体来说,我们首先使用最近提出的计算高效的统计分析方法,即通过核的空间模式识别,为空间转录组学数据生成主要的检验统计量。之后,我们使用柯西组合规则来构建辅助协变量,该协变量由来自多个外部组学研究的信息组合而成,例如批量和单细胞 RNA-seq 数据。最后,我们在主要 P 值上扩展并实现了集成分析方法 OrderShapeEM,并纳入了包含多组学信息的辅助数据,以进行有效的协变量辅助空间表达分析。我们进行了一系列现实模拟,以评估我们的方法在已知真实情况时的性能。在小鼠嗅球、小鼠小脑、人类乳腺癌和人心组织中的四个案例研究进一步证明了与不利用任何外部信息的现有经典方法相比,我们的方法在检测具有空间表达模式的基因方面具有显著的功效增益。