Zhang Lujun, Yang Lu, Ren Yingxue, Zhang Shuwen, Guan Weihua, Chen Jun
Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55455, United States.
Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, United States.
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf327.
Single-cell RNA sequencing (scRNA-seq) has become an important method for characterizing cellular heterogeneity, revealing more biological insights than the bulk RNA-seq. The surge in scRNA-seq data across multiple individuals calls for efficient and statistically powerful methods for differential expression (DE) analysis that addresses individual-level biological variability.
We introduced DiSC, a method for conducting individual-level DE analysis by extracting multiple distributional characteristics, jointly testing their association with a variable of interest, and using a flexible permutation testing framework to control the false discovery rate (FDR). Our simulation studies demonstrated that DiSC effectively controlled the FDR across various settings and exhibited high statistical power in detecting different types of gene expression changes. Moreover, DiSC is computationally efficient and scalable to the rapidly increasing sample sizes in scRNA-seq studies. When applying DiSC to identify DE genes potentially associated with COVID-19 severity and Alzheimer's disease across various types of peripheral blood mononuclear cells and neural cells, we found that our method was approximately 100 times faster than other state-of-the-art methods and the results were consistent and supported by existing literature. While DiSC was developed for scRNA-seq data, its robust testing framework can also be applied to other types of single-cell data. We applied DiSC to cytometry by time-of-flight data, DiSC identified significantly more DE markers than traditional methods.
The R software package "SingleCellStat" is freely available on CRAN (https://cran.r-project.org/web/packages/SingleCellStat/index.html) and GitHub (https://github.com/Lujun995/DiSC). The replication code for reproducing the analyses in this study is publicly accessible at https://github.com/Lujun995/DiSC_Replication_Code. The scRNA-seq expression matrix and metadata utilized in our simulations and analyses can be retrieved from https://cells.ucsc.edu/autism/rawMatrix.zip, https://cellxgene.cziscience.com/collections/1ca90a2d-2943-483d-b678-b809bf464c30, and https://covid19.cog.sanger.ac.uk/submissions/release1/haniffa21.processed.h5ad.
单细胞RNA测序(scRNA-seq)已成为表征细胞异质性的重要方法,比批量RNA测序揭示了更多的生物学见解。跨多个个体的scRNA-seq数据激增,需要高效且具有统计效力的差异表达(DE)分析方法来解决个体水平的生物学变异性。
我们引入了DiSC,这是一种通过提取多种分布特征、联合测试它们与感兴趣变量的关联,并使用灵活的置换检验框架来控制错误发现率(FDR),从而进行个体水平DE分析的方法。我们的模拟研究表明,DiSC在各种设置下都能有效控制FDR,并且在检测不同类型的基因表达变化方面具有很高的统计效力。此外,DiSC计算效率高,可扩展到scRNA-seq研究中快速增加的样本量。当应用DiSC在各种类型的外周血单核细胞和神经细胞中识别可能与COVID-19严重程度和阿尔茨海默病相关的DE基因时,我们发现我们的方法比其他现有最先进方法快约100倍,且结果一致并得到现有文献的支持。虽然DiSC是为scRNA-seq数据开发的,但其强大的测试框架也可应用于其他类型的单细胞数据。我们将DiSC应用于飞行时间流式细胞术数据,DiSC识别出的DE标记物比传统方法显著更多。