Ma Xiuyu, Korthauer Keegan, Kendziorski Christina, Newton Michael A
Department of Statistics, University of Wisconsin-Madison.
Department of Statistics, University of British Columbia.
Ann Appl Stat. 2021 Jun;15(2):880-901. doi: 10.1214/20-aoas1423. Epub 2021 Jul 12.
On the problem of scoring genes for evidence of changes in the distribution of single-cell expression, we introduce an empirical Bayesian mixture approach and evaluate its operating characteristics in a range of numerical experiments. The proposed approach leverages cell-subtype structure revealed in cluster analysis in order to boost gene-level information on expression changes. Cell clustering informs gene-level analysis through a specially-constructed prior distribution over pairs of multinomial probability vectors; this prior meshes with available model-based tools that score patterns of differential expression over multiple subtypes. We derive an explicit formula for the posterior probability that a gene has the same distribution in two cellular conditions, allowing for a gene-specific mixture over subtypes in each condition. Advantage is gained by the compositional structure of the model not only in which a host of gene-specific mixture components are allowed but also in which the mixing proportions are constrained at the whole cell level. This structure leads to a novel form of information sharing through which the cell-clustering results support gene-level scoring of differential distribution. The result, according to our numerical experiments, is improved sensitivity compared to several standard approaches for detecting distributional expression changes.
关于单细胞表达分布变化证据的基因评分问题,我们引入了一种经验贝叶斯混合方法,并在一系列数值实验中评估了其操作特性。所提出的方法利用聚类分析中揭示的细胞亚型结构,以增强关于表达变化的基因水平信息。细胞聚类通过对多项概率向量对的特殊构造先验分布为基因水平分析提供信息;该先验与可用的基于模型的工具相结合,这些工具对多个亚型上的差异表达模式进行评分。我们推导了一个明确的公式,用于计算基因在两种细胞条件下具有相同分布的后验概率,允许在每种条件下对亚型进行基因特异性混合。该模型的组成结构不仅在允许大量基因特异性混合成分方面,而且在整个细胞水平上对混合比例进行约束方面都具有优势。这种结构导致了一种新颖的信息共享形式,通过这种形式,细胞聚类结果支持差异分布的基因水平评分。根据我们的数值实验,与几种检测分布表达变化的标准方法相比,结果的敏感性得到了提高。