Ling Wodan, Zhang Wenfei, Cheng Bin, Wei Ying
Public Health Sciences Division, Fred Hutchinson Cancer Research Center.
Sarepta Therapeutics.
Ann Appl Stat. 2021 Dec;15(4):1673-1696. doi: 10.1214/21-aoas1442. Epub 2021 Dec 21.
Differential gene expression analysis based on scRNA-seq data is challenging due to two unique characteristics of scRNA-seq data. First, multimodality and other heterogeneity of the gene expression among different cell conditions lead to divergences in the tail events or crossings of the expression distributions. Second, scRNA-seq data generally have a considerable fraction of dropout events, causing zero inflation in the expression. To account for the first characteristic, existing parametric approaches targeting the mean difference in gene expression are limited, while quantile regression that examines various locations in the distribution will improve the power. However, the second characteristic, zero inflation, makes the traditional quantile regression invalid and underpowered. We propose a quantile-based test that handles the two characteristics, multimodality and zero inflation, simultaneously. The proposed quantile rank-score based test for differential distribution detection (ZIQRank) is derived under a two-part quantile regression model for zero-inflated outcomes. It comprises a test in logistic modeling for the zero counts and a collection of rank-score tests adjusting for zero inflation at multiple prespecified quantiles of the positive part. The testing decision is based on an aggregate result by combining the marginal -values by MinP or Cauchy procedure. The proposed test is asymptotically justified and evaluated with simulation studies. It shows a higher precision-recall AUC in detecting true differentially expressed genes (DEGs) than the existing methods. We apply the ZIQRank test to a TPM scRNA-seq data on human glioblastoma tumors and exclusively identify a group of DEGs between neoplastic and nonneoplastic cells, which are heterogeneous and have been proved to be associated with glioma. Application to a UMI count scRNA-seq data on cells from mouse intestinal organoids further demonstrates the capability of ZIQRank to improve and complement the existing approaches.
基于单细胞RNA测序(scRNA-seq)数据进行差异基因表达分析具有挑战性,这是由于scRNA-seq数据的两个独特特征。首先,不同细胞状态下基因表达的多模态性和其他异质性导致表达分布的尾部事件或交叉出现差异。其次,scRNA-seq数据通常有相当一部分缺失事件,导致表达出现零膨胀。为了解决第一个特征,现有的针对基因表达平均差异的参数方法有限,而检查分布中不同位置的分位数回归将提高检验效能。然而,第二个特征,即零膨胀,使得传统的分位数回归无效且效能不足。我们提出了一种基于分位数的检验方法,可同时处理多模态性和零膨胀这两个特征。所提出的基于分位数秩得分的差异分布检测检验(ZIQRank)是在一个用于零膨胀结果的两部分分位数回归模型下推导出来的。它包括一个用于零计数的逻辑建模检验以及一组在正部分的多个预先指定分位数处针对零膨胀进行调整的秩得分检验。检验决策基于通过MinP或柯西程序组合边际P值的汇总结果。所提出的检验在渐近意义上是合理的,并通过模拟研究进行了评估。与现有方法相比,它在检测真正的差异表达基因(DEG)时显示出更高的精确召回率AUC。我们将ZIQRank检验应用于人类胶质母细胞瘤肿瘤的TPM scRNA-seq数据,并专门鉴定出一组肿瘤细胞与非肿瘤细胞之间的DEG,这些基因具有异质性且已被证明与胶质瘤相关。将其应用于来自小鼠肠道类器官细胞的UMI计数scRNA-seq数据进一步证明了ZIQRank改进和补充现有方法的能力。