Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35294, USA.
Collat School of Business, the University of Alabama at Birmingham, Birmingham, AL 35294, USA.
Genomics Proteomics Bioinformatics. 2021 Jun;19(3):493-503. doi: 10.1016/j.gpb.2020.09.006. Epub 2021 Dec 25.
In this work, we describe the development of Polar Gini Curve, a method for characterizing cluster markers by analyzing single-cell RNA sequencing (scRNA-seq) data. Polar Gini Curve combines the gene expression and the 2D coordinates ("spatial") information to detect patterns of uniformity in any clustered cells from scRNA-seq data. We demonstrate that Polar Gini Curve can help users characterize the shape and density distribution of cells in a particular cluster, which can be generated during routine scRNA-seq data analysis. To quantify the extent to which a gene is uniformly distributed in a cell cluster space, we combine two polar Gini curves (PGCs)-one drawn upon the cell-points expressing the gene (the "foreground curve") and the other drawn upon all cell-points in the cluster (the "background curve"). We show that genes with highly dissimilar foreground and background curves tend not to uniformly distributed in the cell cluster-thus having spatially divergent gene expression patterns within the cluster. Genes with similar foreground and background curves tend to uniformly distributed in the cell cluster-thus having uniform gene expression patterns within the cluster. Such quantitative attributes of PGCs can be applied to sensitively discover biomarkers across clusters from scRNA-seq data. We demonstrate the performance of the Polar Gini Curve framework in several simulation case studies. Using this framework to analyze a real-world neonatal mouse heart cell dataset, the detected biomarkers may characterize novel subtypes of cardiac muscle cells. The source code and data for Polar Gini Curve could be found at http://discovery.informatics.uab.edu/PGC/ or https://figshare.com/projects/Polar_Gini_Curve/76749.
在这项工作中,我们描述了 Polar Gini 曲线的开发,这是一种通过分析单细胞 RNA 测序 (scRNA-seq) 数据来描述聚类标记物的方法。 Polar Gini 曲线结合了基因表达和二维坐标(“空间”)信息,以从 scRNA-seq 数据中检测任何聚类细胞中的均匀模式。我们证明, Polar Gini 曲线可以帮助用户描述特定聚类中细胞的形状和密度分布,这可以在常规 scRNA-seq 数据分析过程中生成。为了量化基因在细胞聚类空间中均匀分布的程度,我们结合了两个 Polar Gini 曲线(PGC)-一个绘制在表达基因的细胞点上(“前景曲线”),另一个绘制在聚类中的所有细胞点上(“背景曲线”)。我们表明,具有高度不同的前景和背景曲线的基因往往不会在细胞聚类中均匀分布-因此在聚类中具有空间上发散的基因表达模式。具有相似前景和背景曲线的基因往往在细胞聚类中均匀分布-因此在聚类中具有均匀的基因表达模式。 PGC 的这种定量属性可用于从 scRNA-seq 数据中灵敏地发现聚类之间的生物标志物。我们在几个模拟案例研究中展示了 Polar Gini 曲线框架的性能。使用该框架分析真实的新生儿小鼠心脏细胞数据集,检测到的生物标志物可能表征新型心肌细胞亚型。 Polar Gini 曲线的源代码和数据可在 http://discovery.informatics.uab.edu/PGC/ 或 https://figshare.com/projects/Polar_Gini_Curve/76749 找到。