Bioinformatics and Computational Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health, 12 South Drive, Bethesda, MD 20892, USA.
Hematology Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, 10 Center Drive, Bethesda, MD 20814, USA.
Gigascience. 2019 Oct 1;8(10). doi: 10.1093/gigascience/giz121.
In single-cell RNA-sequencing analysis, clustering cells into groups and differentiating cell groups by differentially expressed (DE) genes are 2 separate steps for investigating cell identity. However, the ability to differentiate between cell groups could be affected by clustering. This interdependency often creates a bottleneck in the analysis pipeline, requiring researchers to repeat these 2 steps multiple times by setting different clustering parameters to identify a set of cell groups that are more differentiated and biologically relevant.
To accelerate this process, we have developed IKAP-an algorithm to identify major cell groups and improve differentiating cell groups by systematically tuning parameters for clustering. We demonstrate that, with default parameters, IKAP successfully identifies major cell types such as T cells, B cells, natural killer cells, and monocytes in 2 peripheral blood mononuclear cell datasets and recovers major cell types in a previously published mouse cortex dataset. These major cell groups identified by IKAP present more distinguishing DE genes compared with cell groups generated by different combinations of clustering parameters. We further show that cell subtypes can be identified by recursively applying IKAP within identified major cell types, thereby delineating cell identities in a multi-layered ontology.
By tuning the clustering parameters to identify major cell groups, IKAP greatly improves the automation of single-cell RNA-sequencing analysis to produce distinguishing DE genes and refine cell ontology using single-cell RNA-sequencing data.
在单细胞 RNA 测序分析中,通过差异表达 (DE) 基因对细胞进行聚类和区分细胞群是两个独立的步骤,用于研究细胞身份。然而,区分细胞群的能力可能会受到聚类的影响。这种相互依存关系经常在分析管道中造成瓶颈,需要研究人员通过设置不同的聚类参数多次重复这两个步骤,以确定一组更具差异性和生物学相关性的细胞群。
为了加速这一过程,我们开发了 IKAP——一种通过系统调整聚类参数来识别主要细胞群和改善细胞群区分度的算法。我们证明,在默认参数下,IKAP 成功地识别了两个外周血单核细胞数据集和之前发表的小鼠皮质数据集的主要细胞类型,如 T 细胞、B 细胞、自然杀伤细胞和单核细胞。与通过不同聚类参数组合生成的细胞群相比,IKAP 识别的这些主要细胞群具有更多区分性的 DE 基因。我们进一步表明,通过在识别的主要细胞类型内递归应用 IKAP,可以识别细胞亚型,从而在多层次本体中描绘细胞身份。
通过调整聚类参数来识别主要细胞群,IKAP 大大提高了单细胞 RNA 测序分析的自动化程度,使用单细胞 RNA 测序数据生成有区别的 DE 基因,并细化细胞本体。