Department of Informatics, ETSE, University of Valencia, Avda. de la Universidad, s/n, 46100, Burjasot, Valencia, Spain.
Department of Statistics and Operations Research, University of Valencia, Avda. Vicente Andres Estelles, 46100, Burjasot, Valencia, Spain.
BMC Bioinformatics. 2023 Nov 22;24(1):440. doi: 10.1186/s12859-023-05569-6.
Single-cell RNA sequencing (scRNA-seq) is a powerful tool for investigating cell abundance changes during tissue regeneration and remodeling processes. Differential cell abundance supports the initial clustering of all cells; then, the number of cells per cluster and sample are evaluated, and the dependence of these counts concerning the phenotypic covariates of the samples is studied. Analysis heavily depends on the clustering method. Partitioning Around Medoids (PAM or k-medoids) represents a well-established clustering procedure that leverages the downstream interpretation of clusters by pinpointing real individuals in the dataset as cluster centers (medoids) without reducing dimensions. Of note, PAM suffers from high computational costs and memory requirements.
This paper proposes a method for differential abundance analysis using PAM as a clustering method and negative binomial regression as a statistical model to relate covariates to cluster/cell counts. We used this approach to study the differential cell abundance of human endometrial cell types throughout the natural secretory phase of the menstrual cycle. We developed a new R package -scellpam-, that incorporates an efficient parallel C++ implementation of PAM, and applied this package in this study. We compared the PAM-BS clustering method with other methods and evaluated both the computational aspects of its implementation and the quality of the classifications obtained using distinct published datasets with known subpopulations that demonstrate promising results.
The implementation of PAM-BS, included in the scellpam package, exhibits robust performance in terms of speed and memory usage compared to other related methods. PAM allowed quick and robust clustering of sets of cells with a size ranging from 70,000 to 300,000 cells. https://cran.r-project.org/web/packages/scellpam/index.html . Finally, our approach provides important new insights into the transient subpopulations associated with the fertile time frame when applied to the study of changes in the human endometrium during the secretory phase of the menstrual cycle.
单细胞 RNA 测序 (scRNA-seq) 是一种强大的工具,可用于研究组织再生和重塑过程中细胞丰度的变化。差异细胞丰度支持所有细胞的初始聚类;然后,评估每个聚类和样本的细胞数量,并研究这些计数与样本表型协变量的依赖性。分析严重依赖于聚类方法。基于中位数的分区 (PAM 或 k-medoids) 是一种成熟的聚类程序,通过将数据集内的真实个体作为聚类中心(中位数)而不降低维度,从而支持对聚类的下游解释。值得注意的是,PAM 存在计算成本高和内存需求大的问题。
本文提出了一种使用 PAM 作为聚类方法和负二项回归作为统计模型来将协变量与聚类/细胞计数相关联的差异丰度分析方法。我们使用这种方法研究了人类子宫内膜细胞类型在整个月经周期自然分泌期的差异细胞丰度。我们开发了一个新的 R 包 -scellpam-,它包含了 PAM 的高效并行 C++实现,并在本研究中应用了这个包。我们比较了 PAM-BS 聚类方法与其他方法,并评估了其实现的计算方面以及使用具有已知亚群的不同已发表数据集获得的分类质量,结果表明该方法具有很好的效果。
包含在 scellpam 包中的 PAM-BS 的实现与其他相关方法相比,在速度和内存使用方面表现出稳健的性能。PAM 允许快速稳健地对大小范围从 70,000 到 300,000 个细胞的细胞集进行聚类。https://cran.r-project.org/web/packages/scellpam/index.html 。最后,我们的方法应用于研究人类子宫内膜在月经周期分泌期的变化时,为与生育期相关的短暂亚群提供了重要的新见解。