Unit of Statistical Genetics, Center for Genomic Medicine Graduate School of Medicine, Kyoto University, Kyoto, Japan.
PLoS One. 2020 Apr 10;15(4):e0231250. doi: 10.1371/journal.pone.0231250. eCollection 2020.
Single-cell expression analysis is an effective tool for studying the dynamics of cell population profiles. However, the majority of statistical methods are applied to individual profiles and the methods for comparing multiple profiles simultaneously are limited. In this study, we propose a nonparametric statistical method, called Decomposition into Extended Exponential Family (DEEF), that embeds a set of single-cell expression profiles of several markers into a low-dimensional space and identifies the principal distributions that describe their heterogeneity. We demonstrate that DEEF can appropriately decompose and embed sets of theoretical probability distributions. We then apply DEEF to a cytometry dataset to examine the effects of epidermal growth factor stimulation on an adult human mammary gland. It is shown that DEEF can describe the complex dynamics of cell population profiles using two parameters and visualize them as a trajectory. The two parameters identified the principal patterns of the cell population profile without prior biological assumptions. As a further application, we perform a dimensionality reduction and a time series reconstruction. DEEF can reconstruct the distributions based on the top coordinates, which enables the creation of an artificial dataset based on an actual single-cell expression dataset. Using the coordinate system assigned by DEEF, it is possible to analyze the relationship between the attributes of the distribution sample and the features or shape of the distribution using conventional data mining methods.
单细胞表达分析是研究细胞群体分布动态的有效工具。然而,大多数统计方法都应用于个体分布,同时比较多个分布的方法是有限的。在本研究中,我们提出了一种称为扩展指数族分解(DEEF)的非参数统计方法,该方法将一组多个标记的单细胞表达谱嵌入到低维空间中,并识别描述其异质性的主要分布。我们证明了 DEEF 可以适当分解和嵌入一组理论概率分布。然后,我们将 DEEF 应用于细胞仪数据集,以检查表皮生长因子刺激对成年人类乳腺的影响。结果表明,DEEF 可以使用两个参数来描述细胞群体分布的复杂动态,并将其可视化作为轨迹。这两个参数确定了细胞群体分布的主要模式,而无需事先进行生物学假设。作为进一步的应用,我们进行了降维和时间序列重建。DEEF 可以基于顶部坐标重建分布,这使得可以基于实际的单细胞表达数据集创建人工数据集。使用 DEEF 分配的坐标系,可以使用传统的数据挖掘方法分析分布样本的属性与分布的特征或形状之间的关系。