Wang Yishan, Zang Chenxuan, Li Ziyi, Guo Charles C, Lai Dejian, Wei Peng
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston (UTHealth), Houston, TX 77030, USA.
bioRxiv. 2025 Feb 22:2025.02.17.638726. doi: 10.1101/2025.02.17.638726.
Spatial transcriptomics (ST) provides unprecedented insights into gene expression patterns while retaining spatial context, making it a valuable tool for understanding complex tissue architectures, such as those found in cancers. Seurat, by far the most popular tool for analyzing ST data, uses the Wilcoxon rank-sum test by default for differential expression analysis. However, as a nonparametric method that disregards spatial correlations, the Wilcoxon test can lead to inflated false positive rates and misleading findings. This limitation highlights the need for a more robust statistical approach that effectively incorporates spatial correlations. To this end, we propose a Generalized Score Test (GST) in the Generalized Estimating Equations (GEEs) framework as a robust solution for differential gene expression analysis in ST. We conducted a comprehensive comparison of the GST with existing methods, including the Wilcoxon rank-sum test and the GEEs with the robust Wald test. By appropriately accounting for spatial correlations, extensive simulations showed that the GST demonstrated superior Type I error control and comparable power relative to other methods. Applications to ST datasets from breast and prostate cancer showed that the GST-identified differentially expressed genes were enriched in pathways directly implicated in cancer progression. In contrast, the Wilcoxon test-identified genes were enriched in non-cancer pathways and produced substantial false positives, highlighting its limitations for spatially structured data. Our findings suggest that the GST approach is well-suited for ST data, offering more accurate identification of biologically relevant gene expression changes. We have implemented the proposed method in R package "SpatialGEE", available on GitHub.
空间转录组学(ST)在保留空间背景的同时,为基因表达模式提供了前所未有的见解,使其成为理解复杂组织结构(如癌症中发现的组织结构)的宝贵工具。Seurat是目前分析ST数据最流行的工具,默认使用Wilcoxon秩和检验进行差异表达分析。然而,作为一种忽略空间相关性的非参数方法,Wilcoxon检验可能导致假阳性率虚高和误导性结果。这一局限性凸显了需要一种更稳健的统计方法来有效纳入空间相关性。为此,我们在广义估计方程(GEEs)框架中提出了广义得分检验(GST),作为ST中差异基因表达分析的稳健解决方案。我们对GST与现有方法进行了全面比较,包括Wilcoxon秩和检验以及采用稳健Wald检验的GEEs。通过适当考虑空间相关性,大量模拟表明,与其他方法相比,GST在控制I型错误方面表现出色,且功效相当。应用于乳腺癌和前列腺癌的ST数据集表明,GST识别出的差异表达基因在直接与癌症进展相关的通路中富集。相比之下,Wilcoxon检验识别出的基因在非癌症通路中富集,并产生了大量假阳性,凸显了其在处理空间结构化数据方面的局限性。我们的研究结果表明,GST方法非常适合ST数据,能够更准确地识别生物学相关的基因表达变化。我们已将所提出的方法在R包“SpatialGEE”中实现,可在GitHub上获取。